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PREFACE 


This, the seventh edition of Mathematical Methods for Physicists, maintains the tradition 
set by the six previous editions and continues to have as its objective the presentation of all 
the mathematical methods that aspiring scientists and engineers are likely to encounter as 
students and beginning researchers. While the organization of this edition differs in some 
respects from that of its predecessors, the presentation style remains the same: Proofs are 
sketched for almost all the mathematical relations introduced in the book, and they are 
accompanied by examples that illustrate how the mathematics applies to real-world physics 
problems. Large numbers of exercises provide opportunities for the student to develop skill 
in the use of the mathematical concepts and also show a wide variety of contexts in which 
the mathematics is of practical use in physics. 

As in the previous editions, the mathematical proofs are not what a mathematician would 
consider rigorous, but they nevertheless convey the essence of the ideas involved, and also 
provide some understanding of the conditions and limitations associated with the rela- 
tionships under study. No attempt has been made to maximize generality or minimize the 
conditions necessary to establish the mathematical formulas, but in general the reader is 
warned of limitations that are likely to be relevant to use of the mathematics in physics 
contexts. 


TO THE STUDENT 


The mathematics presented in this book is of no use if it cannot be applied with some skill, 
and the development of that skill cannot be acquired passively, e.g., by simply reading the 
text and understanding what is written, or even by listening attentively to presentations 
by your instructor. Your passive understanding needs to be supplemented by experience 
in using the concepts, in deciding how to convert expressions into useful forms, and in 
developing strategies for solving problems. A considerable body of background knowledge 
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needs to be built up so as to have relevant mathematical tools at hand and to gain experi- 
ence in their use. This can only happen through the solving of problems, and it is for this 
reason that the text includes nearly 1400 exercises, many with answers (but not methods 
of solution). If you are using this book for self-study, or if your instructor does not assign 
a considerable number of problems, you would be well advised to work on the exercises 
until you are able to solve a reasonable fraction of them. 

This book can help you to learn about mathematical methods that are important in 
physics, as well as serve as a reference throughout and beyond your time as a student. 
It has been updated to make it relevant for many years to come. 


WHAT’S NEW 


This seventh edition is a substantial and detailed revision of its predecessor; every word of 
the text has been examined and its appropriacy and that of its placement has been consid- 
ered. The main features of the revision are: (1) An improved order of topics so as to reduce 
the need to use concepts before they have been presented and discussed. (2) An introduc- 
tory chapter containing material that well-prepared students might be presumed to know 
and which will be relied on (without much comment) in later chapters, thereby reducing 
redundancy in the text; this organizational feature also permits students with weaker back- 
grounds to get themselves ready for the rest of the book. (3) A strengthened presentation of 
topics whose importance and relevance has increased in recent years; in this category are 
the chapters on vector spaces, Green’s functions, and angular momentum, and the inclu- 
sion of the dilogarithm among the special functions treated. (4) More detailed discussion 
of complex integration to enable the development of increased skill in using this extremely 
important tool. (5) Improvement in the correlation of exercises with the exposition in the 
text, and the addition of 271 new exercises where they were deemed needed. (6) Addition 
of a few steps to derivations that students found difficult to follow. We do not subscribe 
to the precept that “advanced” means “compressed” or “difficult.” Wherever the need has 
been recognized, material has been rewritten to enhance clarity and ease of understanding. 

In order to accommodate new and expanded features, it was necessary to remove or 
reduce in emphasis some topics with significant constituencies. For the most part, the 
material thereby deleted remains available to instructors and their students by virtue of 
its inclusion in the on-line supplementary material for this text. On-line only are chapters 
on Mathieu functions, on nonlinear methods and chaos, and a new chapter on periodic sys- 
tems. These are complete and newly revised chapters, with examples and exercises, and 
are fully ready for use by students and their instuctors. Because there seems to be a sig- 
nificant population of instructors who wish to use material on infinite series in much the 
same organizational pattern as in the sixth edition, that material (largely the same as in 
the print edition, but not all in one place) has been collected into an on-line infinite series 
chapter that provides this material in a single unit. The on-line material can be accessed at 
www.elsevierdirect.com. 
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PATHWAYS THROUGH THE MATERIAL 


This book contains more material than an instructor can expect to cover, even in a 
two-semester course. The material not used for instruction remains available for reference 
purposes or when needed for specific projects. For use with less fully prepared students, 
a typical semester course might use Chapters 1 to 3, maybe part of Chapter 4, certainly 
Chapters 5 to 7, and at least part of Chapter 11. A standard graduate one-semester course 
might have the material in Chapters | to 3 as prerequisite, would cover at least part of 
Chapter 4, all of Chapters 5 through 9, Chapter 11, and as much of Chapters 12 through 
16 and/or 18 as time permits. A full-year course at the graduate level might supplement 
the foregoing with several additional chapters, almost certainly including Chapter 20 (and 
Chapter 19 if not already familiar to the students), with the actual choice dependent on 
the institution’s overall graduate curriculum. Once Chapters | to 3, 5 to 9, and 11 have 
been covered or their contents are known to the students, most selections from the remain- 
ing chapters should be reasonably accessible to students. It would be wise, however, to 
include Chapters 15 and 16 if Chapter 17 is selected. 
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CHAPTER 1 


MATHEMATICAL 
PRELIMINARIES 


This introductory chapter surveys a number of mathematical techniques that are needed 
throughout the book. Some of the topics (e.g., complex variables) are treated in more detail 
in later chapters, and the short survey of special functions in this chapter is supplemented 
by extensive later discussion of those of particular importance in physics (e.g., Bessel func- 
tions). A later chapter on miscellaneous mathematical topics deals with material requiring 
more background than is assumed at this point. The reader may note that the Additional 
Readings at the end of this chapter include a number of general references on mathemati- 
cal methods, some of which are more advanced or comprehensive than the material to be 
found in this book. 


1.1 INFINITE SERIES 


Perhaps the most widely used technique in the physicist’s toolbox is the use of infinite 
series (i.e., sums consisting formally of an infinite number of terms) to represent functions, 
to bring them to forms facilitating further analysis, or even as a prelude to numerical eval- 
uation. The acquisition of skill in creating and manipulating series expansions is therefore 
an absolutely essential part of the training of one who seeks competence in the mathemat- 
ical methods of physics, and it is therefore the first topic in this text. An important part of 
this skill set is the ability to recognize the functions represented by commonly encountered 
expansions, and it is also of importance to understand issues related to the convergence of 
infinite series. 
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Fundamental Concepts 


The usual way of assigning a meaning to the sum of an infinite number of terms is by 
introducing the notion of partial sums. If we have an infinite sequence of terms 1, U2, 43, 
U4, U5,..., We define the ith partial sum as 


i 
oy. it (1.1) 
n=1 


This is a finite summation and offers no difficulties. If the partial sums s; converge to a 
finite limit as i > oo, 
lim 5; =S, (1.2) 
1—>0o 
the infinite series )°°° | uy is said to be convergent and to have the value S. Note that 
we define the infinite series as equal to S and that a necessary condition for convergence 
to a limit is that limp_, 99 u, = 0. This condition, however, is not sufficient to guarantee 
convergence. 

Sometimes it is convenient to apply the condition in Eq. (1.2) in a form called the 
Cauchy criterion, namely that for each ¢ > 0 there is a fixed number N such that 
|s; — s;| < € for all i and j greater than N. This means that the partial sums must cluster 
together as we move far out in the sequence. 

Some series diverge, meaning that the sequence of partial sums approaches oo; others 
may have partial sums that oscillate between two values, as for example, 

[o,@) 
Youn =1—-141-141-----(-I" ++. 


n=1 





This series does not converge to a limit, and can be called oscillatory. Often the term 
divergent is extended to include oscillatory series as well. It is important to be able to 
determine whether, or under what conditions, a series we would like to use is convergent. 


Example 7.7.17. | THE GEOMETRIC SERIES 





The geometric series, starting with uo = 1 and with a ratio of successive terms r = 
Un+1/Un, has the form 


l+rtrtretetr™ ty... 
Its nth partial sum s,, (that of the first n terms) is! 


1—r" 
n= 





(1.3) 


Restricting attention to |r| < 1, so that for large n, r” approaches zero, and s, possesses 
the limit 


l-—r- 


1 
lim s, = ; (1.4) 


n->0o l—r 





' Multiply and divide sy = oe r™ by 1l—r. 
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showing that for |r| < 1, the geometric series converges. It clearly diverges (or is oscilla- 
tory) for |r| > 1, as the individual terms do not then approach zero at large n. a 


Example 71.7.2 = THE HARMONIC SERIES 


As a second and more involved example, we consider the harmonic series 


ee sh he (1:5) 
n 2 3 4 n , ; 


n=1 





The terms approach zero for large n, i.e., limp+oo 1/n = 0, but this is not sufficient to 
guarantee convergence. If we group the terms (without changing their order) as 


aes ee ee ae 
2° RSS A 5'6 7° 8 9 16 


each pair of parentheses encloses p terms of the form 





1 1 1 pl 


ptl p+2 p+p 2p 2 
Forming partial sums by adding the parenthetical groups one by one, we obtain 


3 4 5 n+1 
>? Ba ae eT Genes Sn > 5) 4 








s=1, m= 


and we are forced to the conclusion that the harmonic series diverges. 

Although the harmonic series diverges, its partial sums have relevance among other 
places in number theory, where Hy, = )-),_1 m—! are sometimes referred to as harmonic 
numbers. a 


We now turn to a more detailed study of the convergence and divergence of series, 
considering here series of positive terms. Series with terms of both signs are treated later. 


Comparison Test 


If term by term a series of terms uw», satisfies 0 < un < dy, where the a, form a convergent 
series, then the series }°,, u; is also convergent. Letting s; and s; be partial sums of the 
u series, with j > i, the difference sj — sj is )’_, 41 4n, and this is smaller than the 
corresponding quantity for the a series, thereby proving convergence. A similar argument 
shows that if term by term a series of terms v, satisfies 0 < by < v,, where the b, form a 
divergent series, then )”,, vn is also divergent. 

For the convergent series a, we already have the geometric series, whereas the harmonic 
series will serve as the divergent comparison series b,. As other series are identified as 
either convergent or divergent, they may also be used as the known series for comparison 


tests. 
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Example 71.7.3 A DIVERGENT SERIES 


—0.999 


Test °° n~?, p = 0.999, for convergence. Since n >n! and b, =n~! forms 


the divergent harmonic series, the comparison test shows that )>,,n~°.?” is divergent. 
Generalizing, )~,,n~? is seen to be divergent for all p < 1. a 
Cauchy Root Test 


If (an)*/ "<r <1 for all sufficiently large n, with r independent of n, then Yn dy 1S 
convergent. If (a,)!/”" > 1 for all sufficiently large n, then >>, dn is divergent. 

The language of this test emphasizes an important point: The convergence or divergence 
of a series depends entirely on what happens for large n. Relative to convergence, it is the 
behavior in the large-n limit that matters. 


The first part of this test is verified easily by raising (a,)!/” 


to the nth power. We get 
an <r” <1. 


Since r” is just the nth term in a convergent geometric series, )°,, dn is convergent by the 
comparison test. Conversely, if (a,)'/" > 1, then a, > 1 and the series must diverge. This 
root test is particularly useful in establishing the properties of power series (Section 1.2). 


D’Alembert (or Cauchy) Ratio Test 


If dn41/dn <r <1 for all sufficiently large n and r is independent of n, then )°,, dn is 
convergent. If dy41/dn > 1 for all sufficiently large n, then >”, dn is divergent. 

This test is established by direct comparison with the geometric series (I+-r+r7+---). 
In the second part, an4+1 > a, and divergence should be reasonably obvious. Although not 
quite as sensitive as the Cauchy root test, this D’Alembert ratio test is one of the easiest to 
apply and is widely used. An alternate statement of the ratio test is in the form of a limit: If 


<1, convergence, 
‘ an+1 
lim —— 


>1, divergence, (1.6) 
n>CO Ap 


=1, indeterminate. 


Because of this final indeterminate possibility, the ratio test is likely to fail at crucial points, 
and more delicate, sensitive tests then become necessary. The alert reader may wonder how 
this indeterminacy arose. Actually it was concealed in the first statement, a)41/an <r < 
1. We might encounter a,41/a, <1 for all finite n but be unable to choose an r < 1 
and independent of n such that a,+41/d, <r for all sufficiently large n. An example is 


provided by the harmonic series, for which 


Gn+1 








<i. 
an n+1 
Since 
‘ an+1 
in = 1, 
N>OO Ay 


no fixed ratio r < | exists and the test fails. 





1.1 Infinite Series 5 
Example 1.1.4 — D’ALEMBERT RATIO TEST 


Test )»,,n/2” for convergence. Applying the ratio test, 


Gn+1 (n+ iyo _ In+l 
GQ.  nf2® 2H 





Since 
a 3 

n+l a 

an 4 


we have convergence. | 


forn > 2, 





Cauchy (or Maclaurin) Integral Test 


This is another sort of comparison test, in which we compare a series with an integral. 
Geometrically, we compare the area of a series of unit-width rectangles with the area under 
a curve. 

Let f(x) be a continuous, monotonic decreasing function in which f(n) = a,. Then 
yan converges if £ - f (x)dx is finite and diverges if the integral is infinite. The ith 
partial sum is 


i= lan = >) FO). 


n=1 n=1 
But, because f(x) is monotonic decreasing, see Fig. 1.1(a), 
i+1 
> / fide. 
1 
On the other hand, as shown in Fig. 1.1(b), 


ee / Fla)dx. 
i 


Taking the limit as i > oo, we have 


CO 


/ fx)dx < Doan < 
1 n=1 


Hence the infinite series converges or diverges as the corresponding integral converges or 
diverges. 

This integral test is particularly useful in setting upper and lower bounds on the remain- 
der of a series after some number of initial terms have been summed. That is, 


oo N oo 
Vian =) an + > an, (1.8) 
n=l 


n=1 n=N+1 


f(x)dx +41. (1.7) 











6 


Chapter 1 Mathematical Preliminaries 

















FIGURE 1.1 (a) Comparison of integral and sum-blocks leading. (b) Comparison of 
integral and sum-blocks lagging. 


and 
CO oO CO 
[ tears So ans | f@)dx+anq. (1.9) 
N+1 n=N+1 N+1 


To free the integral test from the quite restrictive requirement that the interpolating func- 
tion f(x) be positive and monotonic, we shall show that for any function f(x) with a 
continuous derivative, the infinite series is exactly represented as a sum of two integrals: 


No N2 No 
+ foy=f rerax+ fe -ts'coae. (1.10) 
n=N1+1 Ny Ny 


Here [x] is the integral part of x, i.e., the largest integer < x, so x — [x] varies sawtoothlike 
between 0 and 1. Equation (1.10) is useful because if both integrals in Eq. (1.10) converge, 
the infinite series also converges, while if one integral converges and the other does not, 
the infinite series diverges. If both integrals diverge, the test fails unless it can be shown 
whether the divergences of the integrals cancel against each other. 

We need now to establish Eq. (1.10). We manipulate the contributions to the second 
integral as follows: 


1. Using integration by parts, we observe that 


No No 
[orcoas = N2f (N2) — Mf (Ni) — / f(x)dx. 
Ni Ni 
2. We evaluate 
No No-1 "41 No-1 
[esrooas = i f'dx= > nf fin+ 1) - fin] 
My n=N n=N, 


No 
=- > f(a) — Ni f (M1) + No f (2). 


n=N,+1 


Subtracting the second of these equations from the first, we arrive at Eq. (1.10). 
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An alternative to Eq. (1.10) in which the second integral has its sawtooth shifted to be 


symmetrical about zero (and therefore perhaps smaller) can be derived by methods similar 
to those used above. The resulting formula is 


n=N1+1 


No N2 No 
> fay=f roras+ fe-ba- pf ooas 
M M (1.11) 


+ 3] £02) — Fv]. 


Because they do not use a monotonicity requirement, Eqs. (1.10) and (1.11) can be 
applied to alternating series, and even those with irregular sign sequences. 


Example 7.7.5. RIEMANN ZETA FUNCTION 


The Riemann zeta function is defined by 
CO 
j= ya”, (1.12) 
n=1 


providing the series converges. We may take f(x) = x7’, and then 


CO 








fora 220 
x Pdx= , p#i, 
1 maphapd x=1 
[o,@) 
=Inx , p=l. 
x=! 





The integral and therefore the series are divergent for p < 1, and convergent for p > 1. 
Hence Eq. (1.12) should carry the condition p > 1. This, incidentally, is an independent 
proof that the harmonic series (p = 1) diverges logarithmically. The sum of the first million 


terms oh he is only 14.392 726---. a 


While the harmonic series diverges, the combination 


n 
y= lim, (Som tan) (1.13) 


m=1 


converges, approaching a limit known as the Euler-Mascheroni constant. 


Example 71.1.6 | A SLOWLY DiverGING SERIES 


Consider now the series 





Chapter 1 Mathematical Preliminaries 


We form the integral 








[o.@) CO 
1 dinx oo 
J dx= / =InInx F 
x Inx Inx x=2 
2 


x2 


which diverges, indicating that S is divergent. Note that the lower limit of the integral is 
in fact unimportant so long as it does not introduce any spurious singularities, as it is the 
large-x behavior that determines the convergence. Because n Inn > n, the divergence is 
slower than that of the harmonic series. But because Inn increases more slowly than n*, 
where € can have an arbitrarily small positive value, we have divergence even though the 
series )>,,n~'**) converges. a 


More Sensitive Tests 
Several tests more sensitive than those already examined are consequences of a theorem 
by Kummer. Kummer’s theorem, which deals with two series of finite positive terms, uy 


and ay, states: 


1. The series )°,, u, converges if 





lim (a 


noo 


= n+) SOs 6, (1.14) 
Un+1 


where C is a constant. This statement is equivalent to a simple comparison test if the 
series )~,, a, | converges, and imparts new information only if that sum diverges. The 
more weakly )~,, a, | diverges, the more powerful the Kummer test will be. 

2. If}, a,! diverges and 





lim (a = = an) <0, (1.15) 


n—- oo Un+1 
then }°,, un diverges. 
The proof of this powerful test is remarkably simple. Part 2 follows immediately from 


the comparison test. To prove Part 1, write cases of Eq. (1.14) forn = N + 1 through any 
larger n, in the following form: 


un+i < (anun — an+iun4i)/C, 





un42 S (Qn41UNn41 — 4n42UN42)/C, 





Un < (An—1Un—1 — Anun)/C. 
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Adding, we get 











y- u< — ce (1.16) 
anNuUN 

117 

<= (1.17) 


This shows that the tail of the series }°,, u, is bounded, and that series is therefore proved 
convergent when Eq. (1.14) is satisfied for all sufficiently large n. 

Gauss’ test is an application of Kummer’s theorem to series u, > 0 when the ratios of 
successive uv, approach unity and the tests previously discussed yield indeterminate results. 
If for large n 





(1.18) 

Un+1 n n 
where B(n) is bounded for n sufficiently large, then the Gauss test states that )>,, u, con- 
verges for h > 1 and diverges for h < 1: There is no indeterminate case here. 

The Gauss test is extremely sensitive, and will work for all troublesome series the physi- 
cist is likely to encounter. To confirm it using Kummer’s theorem, we take a, =n Inn. The 
series )>,, a, | is weakly divergent, as already established in Example 1.1.6. 

Taking the limit on the left side of Eq. (1.14), we have 


h Bin) 
lim [inn (1 +—+ ae) —(n+1)Ina+ | 
noo n n 


. Bin) lnn 
= lim, [ot Dinn+ (r= yinn + OP" n+ nner + | 
ea, | 
= lim |-(7+ I In{f ——)+(a—-1) Inn}. (1.19) 
n— oo n 


For h < 1, both terms of Eq. (1.19) are negative, thereby signaling a divergent case of 
Kummer’s theorem; for i > 1, the second term of Eq. (1.19) dominates the first and is pos- 
itive, indicating convergence. At h = 1, the second term vanishes, and the first is inherently 
negative, thereby indicating divergence. 


Example 7.7.7 — LEGENDRE SERIES 


The series solution for the Legendre equation (encountered in Chapter 7) has successive 
terms whose ratio under certain conditions is 


aji2 27Q2j+—A 
aj (2f + YQi+2) 
To place this in the form now being used, we define u ; = a2; and write 
uj _ Qf+YQi+2) 
Wjs1 27Q7i+D—A- 
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In the limit of large 7, the constant 2 becomes negligible (in the language of the Gauss test, 
it contributes to an extent B(j)/j7, where B(j) is bounded). We therefore have 
uj ire BU) Bj) 


Uj+l 2j qe oa 





1 
=1+ (1.20) 


The Gauss test tells us that this series is divergent. a 


Exercises 


1.1.1 


(a) Prove that if limy,_,99n? uy, = A < 00, p > 1, the series pies _1 Un converges. 


(b) Prove that if lim,—.99 nuy = A > 0, the series diverges. (The test fails for A = 0.) 
These two tests, known as limit tests, are often convenient for establishing the 
convergence of a series. They may be treated as comparison tests, comparing with 


yon 4 l<q<p. 
n 


If limp oo = = K,aconstant with 0 < K <x, show that &,,b, converges or diverges 
with Lay. 





: 2b 
Hint. lf Dan converges, rescale b, to b), = ae If Dpdn diverges, rescale to bi” = —". 


(a) Show that the series )°-, aime Converges. 


n Th n 


(b) By direct addition 710), °°[ninn)?}-! = 2.02288. Use Eq. (1.9) to make a five- 
significant-figure estimate of the sum of this series. 


Gauss’ test is often given in the form of a test of the ratio 


Un n? +ajn+ao 
Un+1 n2+bin+bo 





For what values of the parameters a; and b, is there convergence? divergence? 


ANS. Convergent for a; — b; > 1, 
divergent for aj — bj < 1. 


Test for convergence 
[o,@) 








(a) odnn)! d) SoIn@+ pr? 
n=1 
Sn! | 
(0) Lig © Lai 
n=1 n=0 


() 2 Gn FD = rei) 
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1.1.6 Test for convergence 
ee ~ 1 
———_ d Infl+-— 
(@) Die @) Yin(1+-) 
n=1 n=1 
ed ae | 
(b) ee (2) Pere 
n=2 n=| 
a 
(c) oo 
n=1 
1.1.7 For what values of p and q will )-7°, ST converge? 
p>, alld, p<i, alla, 
ANS. Convergent for divergent for 
p=1, q>l, p=1, q<l. 
1.1.8 Given pea n~! = 7.485 470... set upper and lower bounds on the Euler-Mascheroni 
constant. 
ANS. 0.5767 < y < 0.5778. 
1.1.9 (From Olbers’ paradox.) Assume a static universe in which the stars are uniformly 


distributed. Divide all space into shells of constant thickness; the stars in any one shell 
by themselves subtend a solid angle of wo. Allowing for the blocking out of distant 
stars by nearer stars, show that the total net solid angle subtended by all stars, shells 
extending to infinity, is exactly 47. [Therefore the night sky should be ablaze with 
light. For more details, see E. Harrison, Darkness at Night: A Riddle of the Universe. 
Cambridge, MA: Harvard University Press (1987). ] 


1.1.10 Test for convergence 


3 1G eS eQn—1) et Be 
2426-30) 4° 64 256 


n=1 





Alternating Series 


In previous subsections we limited ourselves to series of positive terms. Now, in contrast, 
we consider infinite series in which the signs alternate. The partial cancellation due to 
alternating signs makes convergence more rapid and much easier to identify. We shall 
prove the Leibniz criterion, a general condition for the convergence of an alternating series. 
For series with more irregular sign changes, the integral test of Eq. (1.10) is often helpful. 

The Leibniz criterion applies to series of the form re ("tan with a, > 0, and 
states that if a, is monotonically decreasing (for sufficiently large n) and limy_. oo dn = 0, 
then the series converges. To prove this theorem, note that the remainder R2,, of the series 
beyond s2,, the partial sum after 2n terms, can be written in two alternate ways: 


Ron = (G2n41 — 42n42) + (G2n43 — don44) +°°° 














= d2n41 — (42n42 — 42n43) — (A2n44 — A2n45) 
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Since the a, are decreasing, the first of these equations implies Ro, > 0, while the second 
implies Roy < a2n+41, SO 


0< Ron < d2n+1- 


Thus, R2, is positive but bounded, and the bound can be made arbitrarily small by taking 
larger values of n. This demonstration also shows that the error from truncating an alter- 
nating series after a2, results in an error that is negative (the omitted terms were shown to 
combine to a positive result) and bounded in magnitude by a2,+1. An argument similar to 
that made above for the remainder after an odd number of terms, R2,41, would show that 
the error from truncation after a2,+1 is positive and bounded by a2n+2. Thus, it is generally 
true that the error in truncating an alternating series with monotonically decreasing terms 
is of the same sign as the last term kept and smaller than the first term dropped. 

The Leibniz criterion depends for its applicability on the presence of strict sign 
alternation. Less regular sign changes present more challenging problems for convergence 
determination. 


Example 7.7.8 SERIES WITH IRREGULAR SIGN CHANGES 


For 0 < x < 27, the series 


2) 2 eG) (1.21) 


n=1 


converges, having coefficients that change sign often, but not so that the Leibniz criterion 
applies easily. To verify the convergence, we apply the integral test of Eq. (1.10), inserting 
the explicit form for the derivative of cos(nx)/n (with respect to n) in the second integral: 


s= [SO ant f (n-1m) |-* sin(nx) — SO] dn. (1.22) 
n n 


n 
1 1 


Using integration by parts, the first integral in Eq. (1.22) is rearranged to 


lee) : 56 i oOo | 

/ cos(nx) ae [| n i a ra 
n me |, x n 

1 1 


and this integral converges because 





CO 


CO 
[= [a 
dn < 7, ee => 
n 
1 


1 





Looking now at the second integral in Eq. (1.22), we note that its term cos(nx)/n? also 
leads to a convergent integral, so we need only to examine the convergence of 


CO 


/ (x _ inl) ane) He 


1 
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Next, setting (n — [n]) sin(nx) = g’(n), which is equivalent to defining g(N) = ie (n — 
[n]) sin(nx) dn, we write 
CO Co 


[ (etn) 22 an= [LO ana [2O]" 4 [2 Dan 
” i n n=1 n 
: 1 


1 


where the last equality was obtained using once again an integration by parts. We do not 
have an explicit expression for g(n), but we do know that it is bounded because sinx 
oscillates with a period incommensurate with that of the sawtooth periodicity of (n — [n]). 
This boundedness enables us to determine that the second integral in Eq. (1.22) converges, 
thus establishing the convergence of S. | 


Absolute and Conditional Convergence 


An infinite series is absolutely convergent if the absolute values of its terms form a con- 
vergent series. If it converges, but not absolutely, it is termed conditionally convergent. 
An example of a conditionally convergent series is the alternating harmonic series, 


CO 

ans 1 1 1 (—1)"7! 
ayy tea te ee 

>“ yn 5 ais 


n=1 


seats (1.23) 


n 


This series is convergent, based on the Leibniz criterion. It is clearly not absolutely con- 
vergent; if all terms are taken with + signs, we have the harmonic series, which we already 
know to be divergent. The tests described earlier in this section for series of positive terms 
are, then, tests for absolute convergence. 


Exercises 


1.1.11. | Determine whether each of these series is convergent, and if so, whether it is absolutely 
convergent: 


In2 In3 In4 In5~ 1n6 











Oe re ge 
) See ed 
0) T+573 6 7 8 
ec ec ae me ae Or Nee me i 
(c) 3° a SG Fe 8 IO Th 6 16 21 


1.1.12 Catalan’s constant (2) is defined by 


= 1 11 
B2)= (Ik +P = Gata 


2 2 
am 3 5 


Calculate 6(2) to six-digit accuracy. 





14 Chapter 1 Mathematical Preliminaries 


Hint. The rate of convergence is enhanced by pairing the terms, 
16k 
4k —1)-? — (4k + 1)-7 = —_, 
( ) (4k + 1) (16k? — 12 


If you have carried enough digits in your summation, )*, <,<y 16k/(16k* — 1)”, addi- 
tional significant figures may be obtained by setting upper and lower bounds on the tail 
of the series, }°72y,,. These bounds may be set by comparison with integrals, as in 
the Maclaurin integral test. 


ANS. B(2) =0.9159 6559 4177--- . 


Operations on Series 


We now investigate the operations that may be performed on infinite series. In this connec- 
tion the establishment of absolute convergence is important, because it can be proved that 
the terms of an absolutely convergent series may be reordered according to the familiar 
tules of algebra or arithmetic: 


e fan infinite series is absolutely convergent, the series sum is independent of the order 
in which the terms are added. 


e An absolutely convergent series may be added termwise to, or subtracted termwise 
from, or multiplied termwise with another absolutely convergent series, and the result- 
ing series will also be absolutely convergent. 


e The series (as a whole) may be multiplied with another absolutely convergent series. 
The limit of the product will be the product of the individual series limits. The product 
series, a double series, will also converge absolutely. 


No such guarantees can be given for conditionally convergent series, though some of 


the above properties remain true if only one of the series to be combined is conditionally 
convergent. 


Example 71.7.9 REARRANGEMENT OF ALTERNATING HARMONIC SERIES 


Writing the alternating harmonic series as 


,1,! Ds La 1 1 1 1 (1.24) 
2°93 4 —— \2 3 4 5 : , 


it is clear that )°°° ,(—1)""!n7! < 1. However, if we rearrange the order of the terms, we 
can make this series converge to 3. We regroup the terms of Eq. (1.24), as 


ree yr 4. Eee yi idaect 
2 oO 2 7 9 It 13 «15 


1 1 1 1 1 1 1 1.25 
Cae ramen Go Cme eC — 
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no 
£ 
3 
7) 
s 
= 
i) 
oa 








5 6 7 8 
Number of terms in sum, n 





FiGuURE 1.2 Alternating harmonic series. Terms are rearranged to give 
convergence to 1.5. 


Treating the terms grouped in parentheses as single terms for convenience, we obtain the 
partial sums 


Ss} =1.5333 sy = 1.0333 
§3=1.5218 sq =1.2718 
§5=15143 se = 1.3476 
s7=1.5103 sg = 1.3853 
So =1.5078 = sig = 1.4078. 


From this tabulation of s, and the plot of s, versus n in Fig. 1.2, the convergence to 3 is 
fairly clear. Our rearrangement was to take positive terms until the partial sum was equal 
to or greater than 3 and then to add negative terms until the partial sum just fell below 3 
and so on. As the series extends to infinity, all original terms will eventually appear, but 
the partial sums of this rearranged alternating harmonic series converge to 3. a 


As the example shows, by a suitable rearrangement of terms, a conditionally convergent 
series may be made to converge to any desired value or even to diverge. This statement is 
sometimes called Riemann’s theorem. 

Another example shows the danger of multiplying conditionally convergent series. 


Example 71.71.10 = Square OF A CONDITIONALLY CONVERGENT SERIES MAY DIVERGE 


-1r-l Can See ee 
The series )°~, —Y"— converges by the Leibniz criterion. Its square, 


Va 














(-1)""! afi 1 1 1 7 ao 
ES cml =D ) Iss +375" += 


n=1 
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has a general term, in [...], consisting of n — 1 adiive terms, each of which is bigger than 
a so the entire [...] term is greater than 7 + and does not go to zero. Hence the 
general term of this product series does not es zero in the limit of large n and the 
series diverges. | 


These examples show that conditionally convergent series must be treated with caution. 


Improvement of Convergence 


This section so far has been concerned with establishing convergence as an abstract math- 
ematical property. In practice, the rate of convergence may be of considerable importance. 
A method for improving convergence, due to Kummer, is to form a linear combination of 
our slowly converging series and one or more series whose sum is known. For the known 
series the following collection is particularly useful: 


1 


——_ =], 
(a(n + 1) 


= 
II 


i i 4.2 i IVE iM. 


1 1 
(a+ In +2) ~ 4’ 





1 1 
peat Da+2Aa+ 3) 18 





’ 


= 1 
2, wep" (n+p) pp! 





(1.26) 


These sums can be evaluated via partial fraction expansions, and are the subject of 
Exercise 1.5.3. 

The series we wish to sum and one or more known series (multiplied by coefficients) 
are combined term by term. The coefficients in the linear combination are chosen to cancel 
the most slowly converging terms. 


Example 1.1.11 Riemann ZETA FUNCTION ¢(3) 


From the definition in Eq. (1.12), we identify ¢(3) as par n->. Noting that a of 
Eq. (1.26) has a large-n dependence ~ n~3, we consider the linear combination 
foe) = a 
yin * + aan = 5(3) + 7, (1.27) 
n=1 
We did not use a; because it converges more slowly than ¢ (3). Combining the two series 
on the left-hand side termwise, we obtain 
3 tak a ye 
mM n(nt+1)(n+2)] _ & Watint2 ° 


n=1 











1.1 Infinite Series 17 


Table 1.1 Riemann Zeta 











Function 

s o(s) 
2 1.64493 40668 
3 1.20205 69032 
4 1.08232 32337 
5 1.03692 77551 
6 1.01734 30620 
7 1.00834 92774 
8 1.00407 73562 
9 1.00200 83928 

10 1.00099 45751 

If we choose a = —1, we remove the leading term from the numerator; then, setting this 


equal to the right-hand side of Eq. (1.27) and solving for ¢ (3), 


he a 3n +2 
(O= 9+) Beene (1.28) 





The resulting series may not be beautiful but it does converge as n~‘, faster than n>. 


A more convenient form with even faster convergence is introduced in Exercise 1.1.16. 
There, the symmetry leads to convergence as n~>. a 


Sometimes it is helpful to use the Riemann zeta function in a way similar to that 
illustrated for the @, in the foregoing example. That approach is practical because the 
zeta function has been tabulated (see Table 1.1). 


Example 71.71.72 = CONVERGENCE IMPROVEMENT 


The problem is to evaluate the series )°°° ; 1/(1 +n”). Expanding (1 +.n7)~! =n-7(1+ 
n~?)~! by direct division, we have 


(lan)! =n 7*(1—-n*4+n4*- a 
1+n-? 


1 1 1 1 
eh oe netns 





Therefore 





a | el 
Liye FO) $O+O- Dare 


n=1 


The remainder series converges as n~®. Clearly, the process can be continued as desired. 
You make a choice between how much algebra you will do and how much arithmetic the 
computer will do. | 
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Rearrangement of Double Series 


An absolutely convergent double series (one whose terms are identified by two summation 
indices) presents interesting rearrangement opportunities. Consider 


CO CO 
S= > So Sam: (1.29) 


m=0 n=0 
In addition to the obvious possibility of reversing the order of summation (i.e., doing the m 
sum first), we can make rearrangements that are more innovative. One reason for doing this 
is that we may be able to reduce the double sum to a single summation, or even evaluate 
the entire double sum in closed form. 

As an example, suppose we make the following index substitutions in our double series: 
m=q,n= p-—q. Then we will cover all n > 0, m => 0 by assigning p the range (0, 00), 
and q the range (0, p), so our double series can be written 

(ee) P 
Sa aan (1.30) 


p=0 q=0 


In the nm plane our region of summation is the entire quadrant m > 0, n > 0; in the pq 
plane our summation is over the triangular region sketched in Fig. 1.3. This same pq region 
can be covered when the summations are carried out in the reverse order, but with limits 


CO CO 
S= >. oer. 


q=0 P=4 


The important thing to note here is that these schemes all have in common that, by allowing 
the indices to run over their designated ranges, every dy, is eventually encountered, and 
is encountered exactly once. 











FiGuRE 1.3. The pg index space. 
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Another possible index substitution is to setn = s,m =r — 2s. If we sum over s first, 


its range must be (0, [r/2]), where [r/2] is the integer part of r/2, 1e., [7/2] =r/2 for r 
even and (r — 1)/2 for r odd. The range of r is (0, 00). This situation corresponds to 


S= 


Me 


[r/2] 
aioe: (1.31) 
=0 


r=0 5 


The sketches in Figs. 1.4 to 1.6 show the order in which the a, are summed when using 
the forms given in Eqs. (1.29), (1.30), and (1.31), respectively. 

If the double series introduced originally as Eq. (1.29) is absolutely convergent, then all 
these rearrangements will give the same ultimate result. 
































FiGURE 1.5 Order in which terms are summed with p, q index set, Eq. (1.30). 
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Exercises 


1.1.13 


1.1.14 


1.1.15 


1.1.16 











FiGURE 1.6 Order in which terms are summed with r, s index set, Eq. (1.31). 


Show how to combine ¢(2) = °°, n~? with a1 and a2 to obtain a series converging 
—4 
asn—*, 


Note. €(2) has the known value 17/6. See Eq. (12.66). 


Give a method of computing 


[o.@) 


1 
0-2 Grrie 


n=0 
that converges at least as fast as n~® and obtain a result good to six decimal places. 
ANS. A(3) = 1.051800. 


Show that (a) 7°°.[¢@) —1]=1,  (b) H%(-)"lem) — = 3, 


where ¢(n) is the Riemann zeta function. 


The convergence improvement of 1.1.11 may be carried out more expediently (in this 
special case) by putting a2, from Eq. (1.26), into a more symmetric form: Replacing n 
by n — 1, we have 


~ 1 
m= D7 aT ~ 4 





(a) Combine ¢(3) and a, to obtain convergence as n>, 


(b) Let a, be a4 with n > n — 2. Combine ¢(3), a, and a’, to obtain convergence 
asn- 
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(c) If ¢(3) is to be calculated to six-decimal place accuracy (error 5 x 1077), how 
many terms are required for ¢(3) alone? combined as in part (a)? combined as in 


part (b)? 


Note. The error may be estimated using the corresponding integral. 





a 1 
ANS. (a) SC) ae Det ay 
n=2 


1.2 SERIES OF FUNCTIONS 


We extend our concept of infinite series to include the possibility that each term u, may 
be a function of some variable, u, = u,(x). The partial sums become functions of the 
variable x, 


Sn (x) = uy (x) + u(x) +++ + Un (x), (1.32) 


as does the series sum, defined as the limit of the partial sums: 


[o,@) 
Youn (x) = S(x) = lim sy(x). (1.33) 
noo 
n=1 
So far we have concerned ourselves with the behavior of the partial sums as a function of 


n. Now we consider how the foregoing quantities depend on x. The key concept here is 
that of uniform convergence. 


Uniform Convergence 


If for any small ¢ > 0 there exists a number N, independent of x in the interval [a, b] 
(that is, a < x < b) such that 


|S(x) —Sp(x)|<e, foralln>N, (1.34) 


then the series is said to be uniformly convergent in the interval [a, b]. This says that 
for our series to be uniformly convergent, it must be possible to find a finite N so that 
the absolute value of the tail of the infinite series, |)~7° y 41 4i(x)], will be less than an 
arbitrary small ¢ for all x in the given interval, including the endpoints. 








Example 71.2.7. |= NONUNIFORM CONVERGENCE 


Consider on the interval [0, 1] the series 


S(x) = JOU = x)x". 


n=0 
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For 0 < x < 1, the geometric series )°,, x” is convergent, with value 1/(1— x), so S(x) = 
1 for these x values. But at x = 1, every term of the series will be zero, and therefore 
S(1) = 0. That is, 


[o.@) 
\ sa)" =1, C2x<1, 
n=0 


So S(x) is convergent for the entire interval [0, 1], and because each term is nonnegative, 
it is also absolutely convergent. If x 4 0, this is a series for which the partial sum sy 
is 1 — x, as can be seen by comparison with Eq. (1.3). Since S(x) = 1, the uniform 
convergence criterion is 


1—-(1—x%)| =x <e. 


No matter what the values of N and a sufficiently small ¢ may be, there will be an x value 
(close to 1) where this criterion is violated. The underlying problem is that x = 1 is the 
convergence limit of the geometric series, and it is not possible to have a convergence rate 
that is bounded independently of x in a range that includes x = 1. 

We note also from this example that absolute and uniform convergence are independent 
concepts. The series in this example has absolute, but not uniform convergence. We will 
shortly present examples of series that are uniformly, but only conditionally convergent. 
And there are series that have neither or both of these properties. | 


Weierstrass M (Majorant) Test 


The most commonly encountered test for uniform convergence is the Weierstrass M test. 
If we can construct a series of numbers par M,;, in which M; > |u;(x)| for all x in the 
interval [a, b] and Lear M; is convergent, our series u;(x) will be uniformly convergent 
in [a, b]. 

The proof of this Weierstrass M test is direct and simple. Since }°; Mj converges, some 
number WN exists such that forn +1>N, 


CO 
ye Mj <e. 


i=n+l 


This follows from our definition of convergence. Then, with |u;(x)| < Mj; for all x in the 
interval a <x <b, 


> uj(x) <€. 
i=n+1 
Hence S(x) =) | uj (x) satisfies 
|S(x) — Sn (x)| = .y uj(x)| <e€, (1.36) 








i=n+1 
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we see that }°P°_, u; (x) is uniformly convergent in [a, b]. Since we have specified absolute 
values in the statement of the Weierstrass M test, the series ey uj; (x) is also seen to 
be absolutely convergent. As we have already observed in Example 1.2.1, absolute and 
uniform convergence are different concepts, and one of the limitations of the Weierstrass 
M test is that it can only establish uniform convergence for series that are also absolutely 
convergent. 

To further underscore the difference between absolute and uniform convergence, we 
provide another example. 


Example 1.2.2 UNIFORMLY CONVERGENT ALTERNATING SERIES 


Consider the series 


i=)! 
S(x) = —;,, -wO<xX <M. 1.37 
@=) 3 (1.37) 
n=1 
Applying the Leibniz criterion, this series is easily proven convergent for the entire inter- 
val —oo < x < on, but it is not absolutely convergent, as the absolute values of its terms 
approach for large n those of the divergent harmonic series. The divergence of the absolute 
value series is obvious at x = 0, where we then exactly have the harmonic series. Never- 
theless, this series is uniformly convergent on —oo < x < ov, as its convergence is for all 
x at least as fast as it is for x = 0. More formally, 


|S(x) — Sn@&)| < [Un41%)| S [Un41)]. 


Since uy+1(0) is independent of x, uniform convergence is confirmed. | 


Abel’s Test 


A somewhat more delicate test for uniform convergence has been given by Abel. If un (x) 
can be written in the form a, f, (x), and 


1. Thea, form a convergent series, 7, dn = A, 

2. Forall x in [a, b] the functions f,,(x) are monotonically decreasing inn, i.e., fn4i(x) < 
fn (x), 

3. For all x in [a, b] all the f(m) are bounded in the range 0 < f,,(x) < M, where M is 
independent of x, 


then >>, Un(x) converges uniformly in [a, b]. 


This test is especially useful in analyzing the convergence of power series. Details of 
the proof of Abel’s test and other tests for uniform convergence are given in the works 
by Knopp and by Whittaker and Watson (see Additional Readings listed at the end of this 
chapter). 
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Properties of Uniformly Convergent Series 


Uniformly convergent series have three particularly useful properties. If a series )°, un (x) 
is uniformly convergent in [a, b] and the individual terms u,, (x) are continuous, 


1. The series sum S(x) = pyre Un (xX) is also continuous. 
2. The series may be integrated term by term. The sum of the integrals is equal to the 
integral of the sum: 


b oO b 
[so dx= > f n(x) dx. (1.38) 
a n=lq 


3. The derivative of the series sum S(x) equals the sum of the individual-term deriva- 
tives: 


[ee 


d d 
= Six) = Dz un), (1.39) 


n=1 
provided the following additional conditions are satisfied: 
dun(x) 





is continuous in [a, b], 





* du (x) 
Sy 7 is uniformly convergent in [a, b]. 
x 


n=1 


Term-by-term integration of a uniformly convergent series requires only continuity of 
the individual terms. This condition is almost always satisfied in physical applications. 
Term-by-term differentiation of a series is often not valid because more restrictive condi- 
tions must be satisfied. 


Exercises 
1.2.1 Find the range of uniform convergence of the series 
os (=1)"! — 1 
(a) n(x) =) — (b) ca=)—. 
n=1 n=1 
ANS. (a) 0<s <x <0oo. 
(b) l<s<x<am. 
1.2.2 For what range of x is the geometric series )-°° 9.x” uniformly convergent? 


ANS. —-l<-s<x<s<l. 


1.2.3 For what range of positive values of x is 07.9 1/(1 +x") 


(a) convergent? (b) uniformly convergent? 
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1.2.4 If the series of the coefficients }* a, and }* b, are absolutely convergent, show that the 
Fourier series 


xr cosnx + by, sinnx) 


is uniformly convergent for —co < x < oo. 


1.2.5 The Legendre series > uj (x) satisfies the recurrence relations 


j even 


Ga DG heel) p 
G+2)(pr 3) 





uj42(x) = uj (x), 


in which the index j is even and / is some constant (but, in this problem, not a non- 
negative odd integer). Find the range of values of x for which this Legendre series is 
convergent. Test the endpoints. 


ANS. -l1<x<l. 
1.2.6 A series solution of the Chebyshev equation leads to successive terms having the ratio 


ujpo(x) (k+ jy? —n? 2 
uj) k&+f+DE+F4+2) 





’ 





with k = 0 and k = 1. Test for convergence at x = +1. 
ANS. Convergent. 
1.2.7 A series solution for the ultraspherical (Gegenbauer) function C(x) leads to the 
recurrence 


(k + jk + j +20) — n(n + 2a) 
aj42=4j F ; 
k+jtDKFI+D 








Investigate the convergence of each of these series at x = +1 as a function of the 
parameter a. 


ANS. Convergent fora < 1, 
divergent for w > 1. 


Taylor’s Expansion 


Taylor’s expansion is a powerful tool for the generation of power series representations of 
functions. The derivation presented here provides not only the possibility of an expansion 
into a finite number of terms plus a remainder that may or may not be easy to evaluate, but 
also the possibility of the expression of a function as an infinite series of powers. 
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We assume that our function f(x) has a continuous nth derivative’ in the interval a < 
x <b. We integrate this nth derivative n times; the first three integrations yield 


[ PP enan= seen) = FW - FE, 


x x2 x 
feof Pendn= faxo[s°Poa — 6° @] 


= f° Vx) — fa) -@-a)f"@, 


x X3 x2 
[an fan f Ponds = f° 9a) = f(a) 


= By? 
— (2-0) fq —- FF x FE MN@. 


Finally, after integrating for the nth time, 


[ (n) = / (@—ay? 7 
dxn-- | fM@pdu = fx) - f@-@-af'@-— a f@ 


‘ (x —a)"7} 


(n — 1)! 





Pa: 


Note that this expression is exact. No terms have been dropped, no approximations made. 
Now, solving for f(x), we have 


fx) = f+ a—a) f(a) 


(x 


—ay _ ,7\n-l 
+ aw fF Qed @—4) f"-Y@+t Rn, (1.40) 


(n — 1)! 


where the remainder, R,, is given by the n-fold integral 


x x2 
= / fe i dx, f(x). (1.41) 
a a 


We may convert R,, into a perhaps more practical form by using the mean value theorem 
of integral calculus: 


x 


[ew Hoa Dee: (1.42) 


a 


2Taylor’s expansion may be derived under slightly less restrictive conditions; compare H. Jeffreys and B. S. Jeffreys, in the 
Additional Readings, Section 1.133. 
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with a < & < x. By integrating n times we get the Lagrangian form? of the remainder: 


(x — a)" 
R= FM. (1.43) 
With Taylor’s expansion in this form there are no questions of infinite series convergence. 
The series contains a finite number of terms, and the only questions concern the magnitude 
of the remainder. 
When the function f(x) is such that lim, Rn = 0, Eq. (1.40) becomes Taylor’s 
series: 


/ (x — a)’ " 
a (a)+-::- 


= es a  ¢a ). (1.44) 


n=0 


Here we encounter for the first time n! with n = 0. Note that we define 0! = 1. 

Our Taylor series specifies the value of a function at one point, x, in terms of the value 
of the function and its derivatives at a reference point a. It is an expansion in powers of 
the change in the variable, namely x — a. This idea can be emphasized by writing Taylor’s 
series in an alternate form in which we replace x by x +h anda by x: 


Co h 
fa+h=)i f™ x). (1.45) 


n=0 
Power Series 


Taylor series are often used in situations where the reference point, a, is assigned the 
value zero. In that case the expansion is referred to as a Maclaurin series, and Eq. (1.40) 
becomes 


2 lee) 
FO) =fOF2f'O+F (Ot =o ~ £0) (1.46) 


mao 


An immediate application of the Maclaurin series is in the expansion of various transcen- 
dental functions into infinite (power) series. 


Example 71.2.3. EXPONENTIAL FUNCTION 


Let f(x) = e*. Differentiating, then setting x = 0, we have 
f™ (0) — 1 
for alln,n=1, 2, 3,.... Then, with = (1.46), we have 
x3 oO Jn 


e* ge ee dos —, (1.47) 
3! n! 





-1/, 
3 An alternate form derived by Cauchy is Ry = ey ae f (n) (é). 
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This is the series expansion of the exponential function. Some authors use this series to 
define the exponential function. 

Although this series is clearly convergent for all x, as may be verified using the 
d’Alembert ratio test, it is instructive to check the remainder term, R,. By Eq. (1.43) we 
have 


x” xn 
R,=— fM@H=—e, 
n! n! 
where & is between 0 and x. Irrespective of the sign of x, 


|x|"el*l 


[Ral < — 
n. 


No matter how large |x| may be, a sufficient increase in n will cause the denominator of 
this form for R, to dominate over the numerator, and limy_+o9 Rn = 0. Thus, the Maclaurin 
expansion of e* converges absolutely over the entire range —0oo < x < co. a 


Now that we have an expansion for exp(x), we can return to Eq. (1.45), and rewrite that 
equation in a form that focuses on its differential operator characteristics. Defining D as 
the operator d/dx, we have 


oo apn 


fotrh=>o" 


n=0 


Ff) =e"? Fx). (1.48) 





n! 


Example 1.2.4 —LocaritHM 


For a second Maclaurin expansion, let f(x) = In(1 + x). By differentiating, we obtain 


f@=04sy4, 


PP Cj= Gl) = Ii daa (1.49) 
Equation (1.46) yields 
x x x4 
Sp a Se de R 
Ind+x)=x 713 ae + Rn 
n p 
_ _ yp! X 
=> i 1) + Rp. (1.50) 
p=1 . 


In this case, for x > 0 our remainder is given by 
x” 
Ro fC), O2h ex 
n! 
xn 
<—, O0<€<x<1. (1.51) 
n 


This result shows that the remainder approaches zero as n is increased indefinitely, pro- 
viding that 0 < x < 1. For x < 0, the mean value theorem is too crude a tool to establish a 
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meaningful limit for R,,. As an infinite series, 


In(l+x)= > (-D*! ual (1.52) 
n 


n=1 


converges for —1 <x <1. The range —1 <x < 1 is easily established by the d’Alembert 
ratio test. Convergence at x = 1 follows by the Leibniz criterion. In particular, at x = 1 we 
have the conditionally convergent alternating harmonic series, to which we can now put a 
value: 





ee Sent! (1.53) 

Nnza= a ae : . 
3 AS ~ ’ 

At x = —1, the expansion becomes the harmonic series, which we well know to be 

divergent. a 


Properties of Power Series 


The power series is a special and extremely useful type of infinite series, and as illustrated 
in the preceding subsection, may be constructed by the Maclaurin formula, Eq. (1.44). 
However obtained, it will be of the general form 


CO 
f (x) = ao + aix + anx? + 43x? +++ = Yo anx”, (1.54) 
n=0 
where the coefficients a; are constants, independent of x. 


Equation (1.54) may readily be tested for convergence either by the Cauchy root test or 
the d’Alembert ratio test. If 


Gn+1 
an 


lim 


noo 


= RR} 








the series converges for —R <x < R. This is the interval or radius of convergence. Since 
the root and ratio tests fail when x is at the limit points +R, these points require special 
attention. 

For instance, if a, =n!, then R = 1 and from Section 1.1 we can conclude that the 
series converges for x = —1 but diverges for x = +1. If a, =n!, then R = 0 and the series 
diverges for all x £0. 

Suppose our power series has been found convergent for —R < x < R; then it will be 
uniformly and absolutely convergent in any interior interval —S < x < S, whereO < S < 
R. This may be proved directly by the Weierstrass M test. 

Since each of the terms uw, (x) = ayx" is a continuous function of x and f(x) = )° apx" 
converges uniformly for —S < x < S, f(x) must be a continuous function in the inter- 
val of uniform convergence. This behavior is to be contrasted with the strikingly different 
behavior of series in trigonometric functions, which are used frequently to represent dis- 
continuous functions such as sawtooth and square waves. 
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With u,,(x) continuous and )* a,x” uniformly convergent, we find that term by term dif- 
ferentiation or integration of a power series will yield a new power series with continuous 
functions and the same radius of convergence as the original series. The new factors in- 
troduced by differentiation or integration do not affect either the root or the ratio test. 
Therefore our power series may be differentiated or integrated as often as desired within 
the interval of uniform convergence (Exercise 1.2.16). In view of the rather severe restric- 
tion placed on differentiation of infinite series in general, this is a remarkable and valuable 
result. 


Uniqueness Theorem 


We have already used the Maclaurin series to expand e* and In(1 + x) into power series. 
Throughout this book, we will encounter many situations in which functions are repre- 
sented, or even defined by power series. We now establish that the power-series represen- 
tation is unique. 

We proceed by assuming we have two expansions of the same function whose intervals 
of convergence overlap in a region that includes the origin: 


00 
$O= > ane", —Ra <x < Rg 


n=0 
CO 

=o bpx", Ry <x < Rp. (1.55) 
n=0 


What we need to prove is that a, = b, for all n. 
Starting from 


[oe CO 
a re —-R<x<R, (1.56) 
n=0 n=0 


where R is the smaller of R, and Rp, we set x = 0 to eliminate all but the constant term of 
each series, obtaining 


aj= bo. 


Now, exploiting the differentiability of our power series, we differentiate Eq. (1.56), 
getting 


Co Co 
Sa SS ihe (1.57) 
n=l 


n=1 


We again set x = 0, to isolate the new constant terms, and find 
ay = bj . 
By repeating this process n times, we get 


an = bn, 
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which shows that the two series coincide. Therefore our power series representation is 
unique. 

This theorem will be a crucial point in our study of differential equations, in which 
we develop power series solutions. The uniqueness of power series appears frequently in 
theoretical physics. The establishment of perturbation theory in quantum mechanics is one 
example. 


Indeterminate Forms 


The power-series representation of functions is often useful in evaluating indeterminate 
forms, and is the basis of ’ Hépital’s rule, which states that if the ratio of two differentiable 
functions f(x) and g(x) becomes indeterminate, of the form 0/0, at x = xo, then 
i PO a OO 
im —= lim —. 
x x0 g(x) x—> x0 g'(x) 
Proof of Eq. (1.58) is the subject of Exercise 1.2.12. 
Sometimes it is easier just to introduce power-series expansions than to evaluate the 
derivatives that enter |’Hopital’s rule. For examples of this strategy, see the following 
Example and Exercise 1.2.15. 


(1.58) 


Example 71.2.5. ALTERNATIVE TO VHOPITAL’s RULE 


Evaluate 
. l—-cosx 
lim t—.—. (1.59) 


x0 x2 


Replacing cosx by its Maclaurin-series expansion, Exercise 1.2.8, we obtain 





l—cosx 1-U—yx?+yx4---) 1 x? 
2 





x x2 2! 4! t 


Letting x — 0, we have 
lim ——.— = =. (1.60) 


The uniqueness of power series means that the coefficients a, may be identified with the 
derivatives in a Maclaurin series. From 


f(x) _ Sian x= > = £0) x", 


n=0 m=0 


we have 


an = 


. FO. 


n 
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Inversion of Power Series 


Suppose we are given a series 
[o,@) 
Y — Yo = a1 (x — x0) +.a2(% — x0)? +++ = Dan (x — x0)”. (1.61) 
n=1 


This gives (y — yo) in terms of (x — x9). However, it may be desirable to have an explicit 
expression for (x — xo) in terms of (y — yo). That is, we want an expression of the form 


CO 
x —x0= >> bn (y— yo)", (1.62) 


n=1 


with the b, to be determined in terms of the assumed known a,,. A brute-force approach, 
which is perfectly adequate for the first few coefficients, is simply to substitute Eq. (1.61) 
into Eq. (1.62). By equating coefficients of (x — x9)” on both sides of Eq. (1.62), and using 
the fact that the power series is unique, we find 


by — 3 

ay 

a2 

b=--, 

1 (1.63) 

2 

b= == (203 _ aya3), 

a 


1 

by= = (Sayaza3 _ aya4 _ a3), and so on. 
a 
1 


Some of the higher coefficients are listed by Dwight. A more general and much more 
elegant approach is developed by the use of complex variables in the first and second 
editions of Mathematical Methods for Physicists. 


Exercises 


1.2.8 Show that 





iad ; x2ntl 
(a) ae? = 1) Qn+ il" 
oO 2n 
ak 
(b) ies oat 


4H. B. Dwight, Tables of Integrals and Other Mathematical Data, 4th ed. New York: Macmillan (1961). (Compare formula 
no. 50.) 





1.2.9 


1.2.10 


1.2.11 


1.2.12 
1.2.13 


1.2.14 


1.2.15 


1.2.16 
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Derive a series expansion of cot x in increasing powers of x by dividing the power 
series for cos x by that for sinx. 


Note. The resultant series that starts with 1/x is known as a Laurent series (cot x does 
not have a Taylor expansion about x = 0, although cot(x) — x! does). Although the 
two series for sin x and cos x were valid for all x, the convergence of the series for cot x 
is limited by the zeros of the denominator, sin x. 


Show by series expansion that 





1 +1 
—In it =coth7! no, |no| > 1. 

2 no—-l 

This identity may be used to obtain a second solution for Legendre’s equation. 


Show that f(x) =x!/? (a) has no Maclaurin expansion but (b) has a Taylor expansion 
about any point x9 4 0. Find the range of convergence of the Taylor expansion about 
x= Xo. 


Prove |’ H6pital’s rule, Eq. (1.58). 


With n > 1, show that 


(a) ~=in(—*_) <0, (b) ~-n(“=*) 50. 
n n—1 n n 


Use these inequalities to show that the limit defining the Euler-Mascheroni constant, 
Eq. (1.13), is finite. 





In numerical analysis it is often convenient to approximate d? yy (x) /dx* by 
d° 1 
5 0) © SIV +h) -— 20) + WO — A). 
dx h 

Find the error in this approximation. 


h2 
ANS. Error = ae (x). 





Evaluate lim 


x0 7 


Xx 


sin(tan x) — tan(sin 2] 


1 
ANS. ——. 
30 
A power series converges for —R < x < R. Show that the differentiated series and 
the integrated series have the same interval of convergence. (Do not bother about the 
endpoints x = +R.) 


1.3. BINOMIAL THEOREM 


An extremely important application of the Maclaurin expansion is the derivation of the 
binomial theorem. 
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Let f(x) = (1 +x)”, in which m may be either positive or negative and is not limited 
to integral values. Direct application of Eq. (1.46) gives 


(m=1) 9 


+x)"=lime+— 5 nieee oye (1.64) 
For this function the remainder is 
xn 
Rn =— (1+ 8)" "m(m — 1)---(m—n +1), (1.65) 
n! 


with € between 0 and x. Restricting attention for now to x > 0, we note that for n > m, 
(1+ &)”~” is amaximum for € = 0, so for positive x, 


[Rul S—|m(m —1)---(m—n + Dl, (1.66) 


with lim, +5 R, = 0 when 0 < x < 1. Because the radius of convergence of a power series 
is the same for positive and for negative x, the binomial series converges for —1 < x <1. 
Convergence at the limit points +1 is not addressed by the present analysis, and depends 
onm. 

Summarizing, we have established the binomial expansion, 








o 1) eu m(m i” 2) 3 
convergent for —1 < x < 1. It is important to note that Eq. (1.67) applies whether or not 
m is integral, and for both positive and negative m. If m is a nonnegative integer, R,, for 
n > ™m vanishes for all x, corresponding to the fact that under those conditions (1 + x)” is 
a finite sum. 

Because the binomial expansion is of frequent occurrence, the coefficients appearing in 
it, which are called binomial coefficients, are given the special symbol 


jes 


n! 


(1+x)"=14+mx+4+— Pett (1.67) 





(1.68) 


n 


and the binomial expansion assumes the general form 


(1+x)” ee (1.69) 
n=0 


In evaluating Eq. (1.68), note that when n = 0, the product in its numerator is empty (start- 
ing from m and descending to m + 1); in that case the convention is to assign the product 
the value unity. We also remind the reader that 0! is defined to be unity. 

In the special case that m is a positive integer, we may write our binomial coefficient in 


terms of factorials: 
m m! 
=, (1.70) 
n n!(m—n)! 


Since n! is undefined for negative integer n, the binomial expansion for positive integer 
m is understood to end with the term n = m, and will correspond to the coefficients in the 
polynomial resulting from the (finite) expansion of (1 + x)’. 
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For positive integer m, the (”) also arise in combinatorial theory, being the number 
of different ways n out of m objects can be selected. That, of course, is consistent with 
the coefficient set if (1 + x)” is expanded. The term containing x” has a coefficient that 
corresponds to the number of ways one can choose the “x” from n of the factors (1 + x) 
and the 1 from the m — n other (1 + x) factors. 

For negative integer m, we can still use the special notation for binomial coefficients, but 
their evaluation is more easily accomplished if we set m = —p, with p a positive integer, 
and write 





e Lj 2OE De Ote=D EM Ot Dt. “gai 
n 


n! n! (p— 1)! 
For nonintegral m, it is convenient to use the Pochhammer symbol, defined for general 
a and nonnegative integer n and given the notation (a), as 


(ao=1, @i=4, @nt1=alat1)---atn), 2D. (1.72) 
For both integral and nonintegral m, the binomial coefficient formula can be written 
- 1 
(") _ mantDn (1.73) 
n n! 


There is a rich literature on binomial coefficients and relationships between them and 
on summations involving them. We mention here only one such formula that arises if we 
evaluate 1/./1 +x, ie., (1 +.x)~!/?. The binomial coefficient 


()=n(-a)(-3)-(-) 











1-3---(Qn—1) (2n — 1)!! 
=(-1)" =(-1)" ; 1.74 
oe 2" n! co (2n)!! ee) 
where the “double factorial” notation indicates products of even or odd positive integers 
as follows: 
1-3-5---(Qn—1)=(2n— 1)! 
(1.75) 
2-4-6---(2n) = (2n)!!. 
These are related to the regular factorials by 
2n)! 
(2n)!!=2"n! and (2Qn-1)!= (2n) : (1.76) 
2" n! 


Note that these relations include the special cases 0!! = (—1)!!= 1. 


Example 1.3.1 RELATIVISTIC ENERGY 


The total relativistic energy of a particle of mass m and velocity v is 


‘ eye 
E=me"\1—-—> ; (1.77) 
c 
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where c is the velocity of light. Using Eq. (1.69) with m = —1/2 and x = —v*/c?, and 
evaluating the binomial coefficients using Eq. (1.74), we have 








F=me|1 1 v? 4 3 v\? 5 v\° 4 
— 2\ ¢ 8\ ee 16\ c? 
1 3 a 5 v2\? 
The first term, mc’, is identified as the rest-mass energy. Then 
1 ay° 5 v2\7 
Exinetic = 5™U° [43543 (3) aeetss | (1.79) 


For particle velocity v < c, the expression in the brackets reduces to unity and we see that 
the kinetic portion of the total relativistic energy agrees with the classical result. a 


The binomial expansion can be generalized for positive integer n to polynomials: 


! 
n! i 


ay'as?-+-anm (1.80) 


m? 


n 

(a) $ag+++++am)" => Sey ee 
where the summation includes all different combinations of nonnegative integers 
N1,N2,..., Mm With }°"".,n; =n. This generalization finds considerable use in statisti- 
cal mechanics. 

In everyday analysis, the combinatorial properties of the binomial coefficients make 
them appear often. For example, Leibniz’s formula for the nth derivative of a product of 
two functions, u(x)v(x), can be written 


oy won) (du) (d" 7 v@) 
(+) (wom) = (7)( — )( re : (1.81) 








Exercises 
1.3.1 The classical Langevin theory of paramagnetism leads to an expression for the magnetic 
polarization, 
coshx 1 
P(x)=c|— ——)}). 
sinhx x 


Expand P(x) as a power series for small x (low fields, high temperature). 


1.3.2 Given that 


1 





4 
o 4 





1 
/ dx “4 
=tan x 
1+ x2 
0 
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expand the integrand into a series and integrate term by term obtaining® 


oe ee ee 
4 3° 5 7 9 an+1 ; 





which is Leibniz’s formula for 7. Compare the convergence of the integrand series and 
the integrated series at x = 1. Leibniz’s formula converges so slowly that it is quite 
useless for numerical work. 
xX 
1.3.3 Expand the incomplete gamma function y(n + 1, x) = | e ‘t"dt ina series of powers 


0 
of x. What is the range of convergence of the resulting series? 


x 














1 2 
ANS. fewa =" a 
n+1 n+2  2!(n+3) 
0 
—1)PxP 
eee +], 
pint p+) 
1.3.4 Develop a series expansion of y = sinh”! x (that is, sinh y = x) in powers of x by 
(a) inversion of the series for sinh y, 
(b) a direct Maclaurin expansion. 
1.3.5 Show that for integral n > 0 es, m\ man 
3. ow that for integral n > 0, aonnt= ae 
yay (+ 2n— 2)" 
—m —_ — 
1.3.6 Show that (1-+.x)""/? = 2 I)" nlon De forma 1, 2, 3, 
1.3.7 Using binomial expansions, compare the three Doppler shift formulas: 
v\-! ; 
(a) v'=v (1 +- ~) moving source; 
c 
(b) v=v (1 a ~) moving observer; 
Cc 
2\ ~1/2 
(c) v'=v (1 + ~) (: - =) relativistic. 
c c 


Note. The relativistic formula agrees with the classical formulas if terms of order v7 /c? 
can be neglected. 


1.3.8 In the theory of general relativity there are various ways of relating (defining) a velocity 
of recession of a galaxy to its red shift, 5. Milne’s model (kinematic relativity) gives 





5The series expansion of tan—! x (upper limit 1 replaced by x) was discovered by James Gregory in 1671, 3 years before 
Leibniz. See Peter Beckmann’s entertaining book, A History of Pi, 2nd ed., Boulder, CO: Golem Press (1971), and L. Berggren, 
J. Borwein, and P. Borwein, Pi: A Source Book, New York: Springer (1997). 
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1.3.9 


1.3.10 


1.3.11 


(a) n= e3(1+ 58). 
1 =) 
(6) macs (1458) 048) : 


1+03/c]'7 
1—v3/c . 


(c) 1+5=[ 


1. Show that for 5 <« 1 (and v3/c < 1), all three formulas reduce to v = cd. 
2. Compare the three velocities through terms of order 6”. 


Note. In special relativity (with 5 replaced by z), the ratio of observed wavelength A to 
emitted wavelength Ao is given by 


Xr c+u Me 
—=14+7z= : 
Xo c—vU 


The relativistic sum w of two velocities u and v in the same direction is given by 





w  u/e+v/c 

c. L+uv/c2° 
If 

vou 

Be te hay, 

coe 


where 0 <a < 1, find w/c in powers of a through terms in a, 


The displacement x of a particle of rest mass mo, resulting from a constant force mog 


along the x-axis, is 
1/2 
ec r\?]" 
x=—j/1l4+(g- —1f, 
8 c 


including relativistic effects. Find the displacement x as a power series in time ¢. 
Compare with the classical result, 


1 


2 
=-ef. 
XxX 2 8 
By use of Dirac’s relativistic theory, the fine structure formula of atomic spectroscopy 
is given by 
,) -1/2 
v 
E=mc"}1+ am ; 
(s +n — |k|)? 

where 


s=(\kP—y?)/?, k= +1,4£2,43,.... 

















Expand in powers of y” through order y+ (y* = Ze?/4sreohic, with Z the atomic num- 
ber). This expansion is useful in comparing the predictions of the Dirac electron theory 
with those of a relativistic Schrédinger electron theory. Experimental results support 
the Dirac theory. 





1.3.12 


1.3.13 


1.3.14 


1.3.15 


1.3.16 
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In a head-on proton-proton collision, the ratio of the kinetic energy in the center of mass 
system to the incident kinetic energy is 





R=[V2mc?2(Ex + 2mc2) — 2mc*]/ Ex. 


Find the value of this ratio of kinetic energies for 


(a) Ex «mc? (nonrelativistic), 
(b) Ex >> mc? (extreme-relativistic). 


ANS. (a) 7 (b) 0. The latter answer is a sort of law of diminish- 
ing returns for high-energy particle accelerators 
(with stationary targets). 


With binomial expansions 
x = x 1 
fm 7 SS SO ott a 
re x-1 1—x7! d 


Adding these two series yields °°. x” =0. 


n=—Oo 
Hopefully, we can agree that this is nonsense, but what has gone wrong? 





(a) Planck’s theory of quantized oscillators leads to an average energy 


Cc 
>> née exp(—néo/kT) 


(e) _ n=l 
[oe] oJ 
exp(—né0/kT) 
n=0 





where é9 is a fixed energy. Identify the numerator and denominator as binomial 
expansions and show that the ratio is 


() 


— 0 
~ exp(eo/kT) — 1° 
(b) Show that the (e) of part (a) reduces to kT, the classical result, for kT >> €0. 


Expand by the binomial theorem and integrate term by term to obtain the Gregory series 
for y =tan—! x (note tan y = x): 


x x 
dt 
tanta f = [UP 4-84 yar 
0 





1472 
0 


ent 


Cc 
= —1)"——.,_ -l<x<l. 
YH  earar ae 
n=0 


The Klein-Nishina formula for the scattering of photons by electrons contains a term of 
the form 





(+e) [24+2e  In(1 + 2e) 
fe)= 62 Fes € |: 
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Here ¢ = hv/mc’, the ratio of the photon energy to the electron rest mass energy. Find 
_ f(e). 
E> 


ANS. 


> wl 


1.3.17. The behavior of a neutron losing energy by colliding elastically with nuclei of mass 
is described by a parameter &, 





gece Ba, A-1 
— n S 
2A A+] 
An approximation, good for large A, is 
£ 2 
2= >a 
A+ 4 


Expand é, and & in powers of A~!. Show that & agrees with £; through (A~!). Find 
the difference in the coefficients of the (A~!)? term. 


1.3.18 Show that each of these two integrals equals Catalan’s constant: 


1 1 
dt d 
(a) / arctan t —, (b) = f nx. 
0 0 


Note. The definition and numerical computation of Catalan’s constant was addressed 
in Exercise 1.1.12. 





1.4 MATHEMATICAL INDUCTION 


We are occasionally faced with the need to establish a relation which is valid for a set of 
integer values, in situations where it may not initially be obvious how to proceed. However, 
it may be possible to show that if the relation is valid for an arbitrary value of some index n, 
then it is also valid if n is replaced by n + 1. If we can also show that the relation is 
unconditionally satisfied for some initial value no, we may then conclude (unconditionally) 
that the relation is also satisfied for m9 + 1, m9 + 2,.... This method of proof is known 
as mathematical induction. It is ordinarily most useful when we know (or suspect) the 
validity of a relation, but lack a more direct method of proof. 


Example 1.4.1. = SUMOF INTEGERS 


The sum of the integers from | through n, here denoted S(n), is given by the formula 
S(n) =n(n + 1)/2. An inductive proof of this formula proceeds as follows: 


1. Given the formula for S(n), we calculate 
n(n+1) (n+ 1)@™+ 2) 
<a ——_ 


Sn+1)=S(n)+(n4+1)= 5) 


tint =[F41]@4)= 


Thus, given S(n), we can establish the validity of S(n + 1). 
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2. Itis obvious that S(1) = 1(2)/2 = 1, so our formula for S(n) is valid for n = 1. 


3. The formula for S(7) is therefore valid for all integers n > 1. | 
Exercises 


n 
1.4.1 Show that )*j*= en 4+ 1)(n+ DGn? +3n— 1). 


j=l 
1.4.2 Prove the Leibniz formula for the repeated differentiation of a product: 
d\" ie d\/ d\"J 
(+) [Fos@)] = X (") (<) ro) (<2) coo] : 


1.5 OPERATIONS ON SERIES EXPANSIONS OF 
FUNCTIONS 


There are a number of manipulations (tricks) that can be used to obtain series that represent 
a function or to manipulate such series to improve convergence. In addition to the proce- 
dures introduced in Section 1.1, there are others that to varying degrees make use of the 
fact that the expansion depends on a variable. A simple example of this is the expansion 
of f(x) =In(1 +x), which we obtained in 1.2.4 by direct use of the Maclaurin expansion 
and evaluation of the derivatives of f(x). An even easier way to obtain this series would 
have been to integrate the power series for 1/(1 + x) term by term from 0 to x: 


1 





=l-x+x7-x94-55 => 
1+x 
In(1 +x) oe 4 
n x)=x-~+2-—+4+---. 
2 3 4 


A problem requiring somewhat more deviousness is given by the following example, in 
which we use the binomial theorem on a series that represents the derivative of the function 
whose expansion is sought. 


Example 71.5.7. APPLICATION OF BINOMIAL EXPANSION 


Sometimes the binomial expansion provides a convenient indirect route to the Maclaurin 
series when direct methods are difficult. We consider here the power series expansion 
Cc 
y2ntl Oe ee 


_ 1 ya n— 1)! ae ee 
sinn'x=)0 Car ee (1.82) 





n= 


Starting from sin y = x, we find dy/dx = 1/1 — x2, and write the integral 


x 

= ee dt 

sin x=y> a—p?2' 
0 
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We now introduce the binomial expansion of (1 — t?)~!/* and integrate term by term. The 
result is Eq. (1.82). a 


Another way of improving the convergence of a series is to multiply it by a polynomial in 
the variable, choosing the polynomial’s coefficients to remove the least rapidly convergent 
part of the resulting series. Here is a simple example of this. 


Example 71.5.2 — MuttipLy SERIES BY POLYNOMIAL 


Returning to the series for In(1 + x), we form 
n+l 


diemuiede sc yr Eta e i ae ie 


n=1 n=1 
= 1 a 
= 1)"7! 1 n 
ap ) (- A): 


=x+>( bianca m= Gi) x", 


= n(n — 1) 








If we take a, = 1, the n in the numerator disappears and our combined series converges as 
~?: the resulting series for In(1 + x) is 


1)? 
wenn (si) (ES) 





Another useful trick is to employ partial fraction expansions, which may convert a 
seemingly difficult series into others about which more may be known. 

If g(x) and h(x) are polynomials in x, with g(x) of lower degree than h(x), and h(x) 
has the factorization h(x) = (x — a1)(x — az)...(x — an), in the case that the factors of 
h(x) are distinct (i.e., 4 has no multiple roots), then g(x)/h(x) can be written in the form 


gx) Cas, Cn 
h(x) x-ay x-a@ X— An 








(1.83) 
If we wish to leave one or more quadratic factors in h(x), perhaps to avoid the introduction 
of imaginary quantities, the corresponding partial-fraction term will be of the form 
ax+b 
x* 4+ pxtq 
If h(x) has repeated linear factors, such as (x — a,)”, the partial fraction expansion for this 
power of x — a, takes the form 
Clim C1,m—-1 a C11 
Gea)” Gaal” Lay: 
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The coefficients in partial fraction expansions are usually found easily; sometimes it is 
useful to express them as limits, such as 


G= Jim (x —aj)g(x)/h(x). (1.84) 


Example 71.5.3 > PARTIAL FRACTION EXPANSION 


Let 
k? c  ax+b 
~ x(x2 + k?) ~ x * x24 ke 
We have written the form of the partial fraction expansion, but have not yet determined the 


values of a, b, and c. Putting the right side of the equation over a common denominator, 
we have 


f(x) 





k _ (x? +k?) +.x(ax +b) 
x(x? + k2) x(x? + k2) 
Expanding the right-side numerator and equating it to the left-side numerator, we get 





O(x2) + O(x) +k? = (e +.a)x2 + bx + ck?, 


which we solve by requiring the coefficient of each power of x to have the same value 
on both sides of this equation. We get b = 0, c= 1, and then a = —1. The final result is 
therefore 

x 


1 
f= 


2 ape (1.85) 


Still more cleverness is illustrated by the following procedure, due to Euler, for changing 
the expansion variable so as to improve the range over which an expansion converges. 
Euler’s transformation, the proof of which (with hints) is deferred to Exercise 1.5.4, makes 
the conversion: 





f(x) = D(-1)" cnx” (1.86) 
n=0 
2 Tera (A) (1.87) 
1+x ram "\p4tx) ° ‘ 


The coefficients a, are repeated differences of the cy: 
ao=Co, 41 =C1—Co, a2=C2—2c; +9, a3 =0¢3— 3c2 + 3c, — 9, ---; 


their general formula is 


y= (“ens (1.88) 
j=0 a 


The series to which the Euler transformation is applied need not be alternating. The coef- 
ficients c, can have a sign factor which cancels that in the definition. 
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Example 1.5.4 — EULER TRANSFORMATION 


The Maclaurin series for In(1 +x) converges extremely slowly, with convergence only for 
|x| < 1. We consider the Euler transformation on the related series 


In(i+x) . x x? 


=1 wee, 1.89 
x a 3 ( ) 








so, in Eq. (1.86), cp = 1/(n + 1). The first few a, are: ag = 1, ay = t_-ja-l mea 
3-2(5 )+l=},a 3=4- (3) +3 (5) — 1=—4, or in general 


(—1)” 
n+1° 





an = 


The converted series is then 


Ind+x) 1 rat x i) x a 
a - ‘Pay 21x ax , 


which rearranges to 


In(1 =-(4)+H(FR) +H) + (1.90 
es +o=( (a s(t) ~s om 


This new series converges nicely at x = 1, and in fact is convergent for all x < oo. a 














Exercises 
1.5.1 Using a partial fraction expansion, show that for 0 < x < 1, 
Xx 
/ dt 1+x 
=In ‘ 
1—?2 1-x 
=x 
1.5.2 Prove the partial fraction expansion 


1 
n(n+1)---(n+ p) 


al rig Ooi Orem res 


where p is a positive integer. 








Hint. Use mathematical induction. Two binomial coefficient formulas of use here are 


+1 o> ee + 
a) ee 
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1.5.3 The formula for a, Eq. (1.26), is a summation of the form iad) Uun(Pp), with 
1 
n(n+1)---(+p). 


Applying a partial fraction decomposition to the first and last factors of the denominator, 
1.e., 





Un(p) = 





1 1 |; 1 
nn+p) pla n+p)’ 
show that u,(p) = ee and that °° | un(p) = ae 
Hint. It is useful to note that u;(p — 1) = 1/p!. 


1.5.4 Proof of Euler transformation: By substituting Eq. (1.88) into Eq. (1.87), verify that 
Eq. (1.86) is recovered. 


Hint. \t may help to rearrange the resultant double series so that both indices are summed 
on the range (0,00). Then the summation not containing the coefficients c; can be 
recognized as a binomial expansion. 


1.5.5 Carry out the Euler transformation on the series for arctan(x): 


ee x x 


t = 
arctan(x) = x 3 + 5 7 + 9 





Check your work by computing arctan(1) = 1/4 and arctan(37!/?) = 7/6. 


1.6 SOME IMPORTANT SERIES 


There are a few series that arise so often that all physicists should recognize them. Here is 
a short list that is worth committing to memory. 











oo x” x2 x3 4 
exp(x) =D) eltat+atatgt —co<x<oo, (1.91) 
n=0 
CO 
; (—1)"x2"41 x3 x? x7 
sn Bor a ee 
n= 
Co 
(—1)"x2" x2 x4 x6 
cos(x) = )° oy =1 a a —co<x<oo, (1.93) 
n=0 
0° x 2nt+l x3 x5 x! 
mh) =) gre? a a ee —co<x<oo, (1.94) 
n=0 
oo xan x2 x4 x6 
cosh(x) = ) | a ee cn reg —00<x<0o, (1.95) 


n=0 
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1 CO 
=) x"=1l¢xtx? tx tate, —-l<x <1, (1.96) 
1-x 
n=0 
oo —lvn 2 3 4 
(-1)""'x x x x 
Ind+x)=)0> =x + dian eee I (1.97) 
ae n 2°3 4 
CO CO 
—n+l1 
dtar=)o (Pay Poet Dee, ee ae (1.98) 
n=0 n=0 : 


Reminder. The notation (a), is the Pochhammer symbol: (a)9 = 1, (a); =a, and for inte- 
gersn > 1, (a), =a(a+t1)---(a+n-— 1). It is not required that a, or p in Eq. (1.98), be 
positive or integral. 


Exercises 





1 3 5 
1.6.1 Show thatin(7**)-2(s+ 545 4--), 22 


—x 


1.7 VECTORS 


In science and engineering we frequently encounter quantities that have algebraic magni- 
tude only (i.e., magnitude and possibly a sign): mass, time, and temperature. These we label 
scalar quantities, which remain the same no matter what coordinates we may use. In con- 
trast, many interesting physical quantities have magnitude and, in addition, an associated 
direction. This second group includes displacement, velocity, acceleration, force, momen- 
tum, and angular momentum. Quantities with magnitude and direction are labeled vector 
quantities. To distinguish vectors from scalars, we usually identify vector quantities with 
boldface type, as in V or x. 

This section deals only with properties of vectors that are not specific to three- 
dimensional (3-D) space (thereby excluding the notion of the vector cross product and 
the use of vectors to describe rotational motion). We also restrict the present discussion to 
vectors that describe a physical quantity at a single point, in contrast to the situation where 
a vector is defined over an extended region, with its magnitude and/or direction a function 
of the position with which it is associated. Vectors defined over a region are called vector 
fields; a familiar example is the electric field, which describes the direction and magnitude 
of the electrical force on a test charge throughout a region of space. We return to these 
important topics in a later chapter. 

The key items of the present discussion are (1) geometric and algebraic descriptions of 
vectors; (2) linear combinations of vectors; and (3) the dot product of two vectors and its 
use in determining the angle between their directions and the decomposition of a vector 
into contributions in the coordinate directions. 
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Basic Properties 


We define a vector in a way that makes it correspond to an arrow from a starting point to 
another point in two-dimensional (2-D) or 3-D space, with vector addition identified as 
the result of placing the tail (starting point) of a second vector at the head (endpoint) of the 
first vector, as shown in Fig. 1.7. As seen in the figure, the result of addition is the same if 
the vectors are added in either order; vector addition is a commutative operation. Vector 
addition is also associative; if we add three vectors, the result is independent of the order 
in which the additions take place. Formally, this means 


(A+ B)+C=A+(B+O). 


It is also useful to define an operation in which a vector A is multiplied by an ordinary 
number k (a scalar). The result will be a vector that is still in the original direction, but 
with its length multiplied by k. If k is negative, the vector’s length is multiplied by |k| but 
its direction is reversed. This means we can interpret subtraction as illustrated here: 


A—-B=A+(-1B, 


and we can form polynomials such as A + 2B — 3C. 

Up to this point we are describing our vectors as quantities that do not depend on any 
coordinate system that we may wish to use, and we are focusing on their geometric prop- 
erties. For example, consider the principle of mechanics that an object will remain in static 
equilibrium if the vector sum of the forces on it is zero. The net force at the point O of 
Fig. 1.8 will be the vector sum of the forces labeled F,, Fz, and F3. The sum of the forces 
at static equilibrium is illustrated in the right-hand panel of the figure. 

It is also important to develop an algebraic description for vectors. We can do so by 
placing a vector A so that its tail is at the origin of a Cartesian coordinate system and by 
noting the coordinates of its head. Giving these coordinates (in 3-D space) the names A,, 
Ay, Az, we have a component description of A. From these components we can use the 
Pythagorean theorem to compute the length or magnitude of A, denoted A or |A], as 


A= (Az + A442)”. (1.99) 


The components A,, ... are also useful for computing the result when vectors are added 
or multiplied by scalars. From the geometry in Cartesian coordinates, it is obvious that if 
C=kA-+K’B, then C will have components 

Cy =kAy +k'By, Cy =kAy+k’By, Cy,= kA, +k'B. 


At this stage it is convenient to introduce vectors of unit length (called unit vectors) in 
the directions of the coordinate axes. Letting é, be a unit vector in the x direction, we can 











FiGURE 1.7 Addition of two vectors. 
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FiGuRE 1.8 Equilibrium of forces at the point O. 


now identify A,é, as a vector of signed magnitude A, in the x direction, and we see that 
A can be represented as the vector sum 


A= A,é, + Ayéy + Az€. (1.100) 


If A is itself the displacement from the origin to the point (x, y, z), we denote it by the 
special symbol r (sometimes called the radius vector), and Eq. (1.100) becomes 


r=xé, + yey + ze. (1.101) 


The unit vectors are said to span the space in which our vectors reside, or to form a 
basis for the space. Either of these statements means that any vector in the space can be 
constructed as a linear combination of the basis vectors. Since a vector A has specific 
values of A,, Ay, and A,, this linear combination will be unique. 

Sometimes a vector will be specified by its magnitude A and by the angles it makes with 
the Cartesian coordinate axes. Letting a, 6B, y be the respective angles our vector makes 
with the x, y, and z axes, the components of A are given by 


A,=Acosa, Ay=AcosfB, A,;=Acosy. (1.102) 


The quantities cosa, cos B, cos y (see Fig. 1.9) are known as the direction cosines of A. 
Since we already know that A2 + AS + A? = A’, we see that the direction cosines are not 
entirely independent, but must satisfy the relation 


cos” a + cos” B + cos” y = 1. (1.103) 


While the formalism of Eq. (1.100) could be developed with complex values for the 
components Ay, Ay, Az, the geometric situation being described makes it natural to restrict 
these coefficients to real values; the space with all possible real values of two coordinates 
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FIGURE 1.10 Projections of A on the x and y axes. 


is denoted by mathematicians (and occasionally by us) IR*; the complete 3-D space is 
named IR?. 


Dot (Scalar) Product 


When we write a vector in terms of its component vectors in the coordinate directions, 
as in 

A= Ay,é, + Ayey + Az€,, 
we can think of A,é, as its projection in the x direction. Stated another way, it is the 
portion of A that is in the subspace spanned by é, alone. The term projection corresponds 
to the idea that it is the result of collapsing (projecting) a vector onto one of the coordinate 
axes. See Fig. 1.10. 


It is useful to define a quantity known as the dot product, with the property that it 
produces the coefficients, e.g., Ax, in projections onto the coordinate axes according to 


A-@,=A,=Acosa, A-@,=Ay=AcosB, A-@,=A,=Acosy, (1.104) 


where cosa, cos 6, cos y are the direction cosines of A. 
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We want to generalize the notion of the dot product so that it will apply to arbitrary 
vectors A and B, requiring that it, like projections, be linear and obey the distributive and 
associative laws 


A-(B+C)=A-B+A.-C, (1.105) 
A- (kB) = (kA)- B=kA-B, (1.106) 


with k a scalar. Now we can use the decomposition of B into Cartesian components as 
in Eq. (1.100), B= Bye, + Byé, + B,é,, to construct the dot product of the vectors A and 
Bas 


A-B=A. (Byé, + Byéy + Be) 
= ByA-é, + ByA-@) + BA: é, 
= B, A, + ByAy + B,A,. (1.107) 
This leads to the general formula 


A-B=)_BjAj=)_ A;B)=B-A, (1.108) 
i i 
which is also applicable when the number of dimensions in the space is other than three. 
Note that the dot product is commutative, with A-B=B- A. 


An important property of the dot product is that A- A is the square of the magnitude 
of A: 


A-A=Al+A0 4+. =A’. (1.109) 
Applying this observation to C = A + B, we have 
IC? =C-C=(A+B)-(A+B)=A-A+B-B+2A-B, 


which can be rearranged to 
1 
A-B= 5[ cP (AP = (BI. (1.110) 


From the geometry of the vector sum C = A + B, as shown in Fig. 1.11, and recalling 
the law of cosines and its similarity to Eq. (1.110), we obtain the well-known formula 


A-B=|A| |B|cos6, (1.111) 








FIGURE 1.11 Vectorsum,C =A+B. 
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where @ is the angle between the directions of A and B. In contrast with the algebraic 
formula Eq. (1.108), Eq. (1.111) is a geometric formula for the dot product, and shows 
clearly that it depends only on the relative directions of A and B and is therefore indepen- 
dent of the coordinate system. For that reason the dot product is sometimes also identified 
as a scalar product. 

Equation (1.111) also permits an interpretation in terms of the projection of a vector A 
in the direction of B or the reverse. If b is a unit vector in the direction of B, the projection 
of A in that direction is given by 


Apb = (b- A)b = (Acos@)b, (1.112) 


where 6 is the angle between A and B. Moreover, the dot product A - B can then be identi- 
fied as |B| times the magnitude of the projection of A in the B direction, so A: B= Ap,B. 
Equivalently, A - B is equal to |A| times the magnitude of the projection of B in the A 
direction, so we also have A- B= B, A. 

Finally, we observe that since | cos @| < 1, Eq. (1.111) leads to the inequality 


|A- B| < |A| |B}. (1.113) 


The equality in Eq. (1.113) holds only if A and B are collinear (in either the same or 
opposite directions). This is the specialization to physical space of the Schwarz inequality, 
which we will later develop in a more general context. 


Orthogonality 


Equation (1.111) shows that A- B becomes zero when cos @ = 0, which occurs at 6 = +77/2 
(i.e., at 6 = +90°). These values of 6 correspond to A and B being perpendicular, the 
technical term for which is orthogonal. Thus, 


A and B are orthogonal if and only if A- B= 0. 


Checking this result for two dimensions, we note that A and B are perpendicular if the 
slope of B, By /B,, is the negative of the reciprocal of Ay/A,, or 


This result expands to A, By + Ay By = 0, the condition that A and B be orthogonal. 

In terms of projections, A - B = 0 means that the projection of A in the B direction 
vanishes (and vice versa). That is of course just another way of saying that A and B are 
orthogonal. 

The fact that the Cartesian unit vectors are mutually orthogonal makes it possible to 
simplify many dot product computations. Because 








é -@) =, -€, =@)-€ =0, 6 -€ = 8b) - by =e, -& = 1, (1.114) 
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we can evaluate A - B as 


(Ay €, + Ay@y +Az€,)-(Br€y + By@y + Br€,) = Ay Byby €: + Ay Byby -@y +A, B.€, -€, 


+ (A, By + Ay Bye + @y + (Ay Bz + Az By ex @, + (Ay Bz + Az By) ey + & 
= A, By + Ay By + AzBz. 


See Chapter 3: Vector Analysis, Section 3.2: Vectors in 3-D Space for an introduction 


of the cross product of vectors, needed early in Chapter 2. 


Exercises 


1.7.1 


1.7.2 


1.7.3 


1.7.4 


1.7.5 


1.7.6 


The vector A whose magnitude is 1.732 units makes equal angles with the coordinate 
axes. Find Ay, Ay, and Az. 


A triangle is defined by the vertices of three vectors A, B and C that extend from the 
origin. In terms of A, B, and C show that the vector sum of the successive sides of the 
triangle (AB + BC + CA) is zero, where the side AB is from A to B, etc. 


A sphere of radius a is centered at a point r}. 


(a) Write out the algebraic equation for the sphere. 
(b) Write out a vector equation for the sphere. 


ANS. (a) (®—m1)?+Q-y)?+@-21)* =a’. 
(b) r=r, +a, where a takes on all directions 
but has a fixed magnitude a. 


Hubble’s law. Hubble found that distant galaxies are receding with a velocity propor- 
tional to their distance from where we are on Earth. For the ith galaxy, 


v; = Aor; 


with us at the origin. Show that this recession of the galaxies from us does not imply 
that we are at the center of the universe. Specifically, take the galaxy at ry as a new 
origin and show that Hubble’s law is still obeyed. 


Find the diagonal vectors of a unit cube with one corner at the origin and its three sides 
lying along Cartesian coordinates axes. Show that there are four diagonals with length 
/3. Representing these as vectors, what are their components? Show that the diagonals 
of the cube’s faces have length ./2 and determine their components. 


The vector r, starting at the origin, terminates at and specifies the point in space (x, y, Zz). 
Find the surface swept out by the tip of r if 
(a) (r—a)-a=0. Characterize a geometrically. 
(b) (r—a)-r=0. Describe the geometric role of a. 
The vector a is constant (in magnitude and direction). 





1.7.7 


1.7.8 


1.7.9 


1.7.10 
1.7.11 
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A pipe comes diagonally down the south wall of a building, making an angle of 45° with 
the horizontal. Coming into a corner, the pipe turns and continues diagonally down a 
west-facing wall, still making an angle of 45° with the horizontal. What is the angle 
between the south-wall and west-wall sections of the pipe? 


ANS. 120°. 


Find the shortest distance of an observer at the point (2, 1,3) from a rocket in free 
flight with velocity (1, 2, 3) km/s. The rocket was launched at time t = 0 from (1, 1, 1). 
Lengths are in kilometers. 


Show that the medians of a triangle intersect in the center which is 2/3 of the median’s 
length from each vertex. Construct a numerical example and plot it. 


Prove the law of cosines starting from A? = (B—C)?. 
Given the three vectors, 
P = 36, + 2@, — €;, 
Q = —6€,; — 4é, + 2€,, 
R=€, — 2é, —€, 


find two that are perpendicular and two that are parallel or antiparallel. 


1.8 COMPLEX NUMBERS AND FUNCTIONS 


Complex numbers and analysis based on complex variable theory have become extremely 
important and valuable tools for the mathematical analysis of physical theory. Though 
the results of the measurement of physical quantities must, we firmly believe, ultimately 
be described by real numbers, there is ample evidence that successful theories predicting 
the results of those measurements require the use of complex numbers and analysis. In a 
later chapter we explore the fundamentals of complex variable theory. Here we introduce 
complex numbers and identify some of their more elementary properties. 


Basic Properties 


A complex number is nothing more than an ordered pair of two real numbers, (a, b). Sim- 
ilarly, a complex variable is an ordered pair of two real variables, 


Z=(%,y). (1.115) 


The ordering is significant. In general (a, b) is not equal to (b, a) and (x, y) is not equal 
to (y,x). As usual, we continue writing a real number (x,0) simply as x, and we call 
i = (0, 1) the imaginary unit. All of complex analysis can be developed in terms of ordered 
pairs of numbers, variables, and functions (u(x, y), v(x, y)). 

We now define addition of complex numbers in terms of their Cartesian components as 


Zp +22 = (41, y1) + (x2, y2) = (1 + 2, yt + 2). (1.116) 
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Multiplication of complex numbers is defined as 


2122 = (X11, V1) + (42, 2) = (41.42 — Yi y2, X1y2 + X21). (1.117) 
It is obvious that multiplication is not just the multiplication of corresponding components. 
Using Eq. (1.117) we verify that i2 = (0, 1)- (0, 1) = (—1, 0) = —1, so we can also identify 
i = /—1 as usual, and further rewrite Eq. (1.115) as 


Z=(x,y)=(4,0)+ (0, y) =x4+ (0, 1)-(,0) =x + iy. (1.118) 


Clearly, introduction of the symbol i is not necessary here, but it is convenient, in large 
part because the addition and multiplication rules for complex numbers are consistent with 
those for ordinary arithmetic with the additional property that i* = —1: 


(x1 +iy1) (x2 + iy2) = x1x2 +i? yi y2 Fie y2 + yix2) = (x1x2 — yi y2) + i(x1y2 + y1x2), 


in agreement with Eq. (1.117). For historical reasons, i and its multiples are known as 
imaginary numbers. 

The space of complex numbers, sometimes denoted Z by mathematicians, has the fol- 
lowing formal properties: 


e It is closed under addition and multiplication, meaning that if two complex numbers 
are added or multiplied, the result is also a complex number. 


e It has a unique zero number, which when added to any complex number leaves it 
unchanged and which, when multiplied with any complex number yields zero. 


e [thas a unique unit number, 1, which when multiplied with any complex number leaves 
it unchanged. 


e Every complex number z has an inverse under addition (known as —z), and every 
nonzero z has an inverse under multiplication, denoted z~! or 1/z. 


e Itis closed under exponentiation: if u and v are complex numbers u” is also a complex 


number. 


From a rigorous mathematical viewpoint, the last statement above is somewhat loose, as it 
does not really define exponentiation, but we will find it adequate for our purposes. 
Some additional definitions and properties include the following: 


Complex conjugation: Like all complex numbers, i has an inverse under addition, 
denoted —i, in two-component form, (0, —1). Given a complex number z = x + iy, it 
is useful to define another complex number, z* = x — iy, which we call the complex con- 
jugate of z.° Forming 


zz* = (x +iy)(x —iy) =x? + y’, (1.119) 


we see that zz* is real; we define the absolute value of z, denoted |z|, as ./zz*. 





6The complex conjugate of z is often denoted Z in the mathematical literature. 
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Division: Consider now the division of two complex numbers: z’/z. We need to manipulate 
this quantity to bring it to the complex number form u + iv (with u and v real). We may 
do so as follows: 

/ 


2 2"  @ +iy)@—iy) 








Zz oze* x2 +4 y2 
or 
tay / / bat 
APE RAN a ee (1.120) 
xX +1y x“ +y x“ + y 


Functions in the Complex Domain 


Since the fundamental operations in the complex domain obey the same rules as those for 
arithmetic in the space of real numbers, it is natural to define functions so that their real and 
complex incarnations are similar, and specifically so that the complex and real definitions 
agree when both are applicable. This means, among other things, that if a function is repre- 
sented by a power series, we should, within the region of convergence of the power series, 
be able to use such series with complex values of the expansion variable. This notion is 
called permanence of the algebraic form. 
Applying this concept to the exponential, we define 


1 1 1 
Zo. SZ gtige os ok a 
eS Maat e +32 +e t : (1.121) 
Now, replacing z by iz, we have 


| en oe ee eo 
Chains Gay, Gay eo Ga)! ies 


1, 14 . 13,175 
=[1- 52 +7e =... ]4ifz— 52 tae ee 5 (1.122) 
It was permissible to regroup the terms in the series of Eq. (1.122) because that series is 
absolutely convergent for all z; the d’ Alembert ratio test succeeds for all z, real or complex. 
If we now identify the bracketed expansions in the last line of Eq. (1.122) as cos z and sin z, 
we have the extremely valuable result 


e =cosz+isinz. (1.123) 

This result is valid for all z, real, imaginary, or complex, but is particularly useful when z 
is real. 

Any function w(z) of a complex variable z = x + iy can in principle be divided into its 


real and imaginary parts, just as we did when we added, multiplied, or divided complex 
numbers. That is, we can write 


w(z) =u(x, y) +iv(x, y), (1.124) 


in which the separate functions u(x, y) and v(x, y) are pure real. For example, if f(z) =z’, 
we have 


f(D) =(e+iy)? = (x? — y*) + iQxy). 
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The real part of a function f(z) will be labeled Re f(z), whereas the imaginary part 
will be labeled 3m f(z). In Eq. (1.124), 


Rew(z)=u(x,y), Jmw(z)=v(x, y). 


The complex conjugate of our function w(z) is u(x, y) —iv(x, y), and depending on w, 
may or may not be equal to w(z*). 


Polar Representation 


We may visualize complex numbers by assigning them locations on a planar graph, called 
an Argand diagram or, more colloquially, the complex plane. Traditionally the real com- 
ponent is plotted horizontally, on what is called the real axis, with the imaginary axis in 
the vertical direction. See Fig. 1.12. An alternative to identifying points by their Cartesian 
coordinates (x, y) is to use polar coordinates (r, 0), with 


x=rcosé, y=rsiné, or r=,/x2+y?, O=tan! y/x. (1.125) 


The arctan function tan~!(y/x) is multiple valued; the correct location on an Argand dia- 
gram needs to be consistent with the individual values of x and y. 
The Cartesian and polar representations of a complex number can also be related by 
writing 
x+iy=r(cosé +isin@) =re', (1.126) 
where we have used Eq. (1.123) to introduce the complex exponential. Note that r is 
also |z|, so the magnitude of z is given by its distance from the origin in an Argand di- 
agram. In complex variable theory, r is also called the modulus of z and 0 is termed the 


argument or the phase of z. 
If we have two complex numbers, z and z’, in polar form, their product zz’ can be written 


zz = (re'®y(r'e!”) = (rr' ef OF), (1.127) 


showing that the location of the product in an Argand diagram will have argument (polar 
angle) at the sum of the polar angles of the factors, and with a magnitude that is the product 











FiGurE 1.12 Argand diagram, showing location of z =x +iy =re!®. 
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FiGuRE 1.13 Left: Relation of z and z*. Right: z+ z* and z — z*. 


of their magnitudes. Conversely, the quotient z/z’ will have magnitude r/r’ and argument 
6 — 0’. These relationships should aid in getting a qualitative understanding of complex 
multiplication and division. This discussion also shows that multiplication and division 
are easier in the polar representation, whereas addition and subtraction have simpler forms 
in Cartesian coordinates. 

The plotting of complex numbers on an Argand diagram makes obvious some other 
properties. Since addition on an Argand diagram is analogous to 2-D vector addition, it 
can be seen that 





<|zt2'| <|z|+[z'. (1.128) 





tel — Iz! 


Also, since z* = re~'® has the same magnitude as z but an argument that differs only in 
sign, z+ z* will be real and equal to 2 Re z, while z — z* will be pure imaginary and equal 
to 2i 3m z. See Fig. 1.13 for an illustration of this discussion. 

We can use an Argand diagram to plot values of a function w(z) as well as just z itself, 
in which case we could label the axes u and v, referring to the real and imaginary parts of 
w. In that case, we can think of the function w(z) as providing a mapping from the xy 
plane to the uv plane, with the effect that any curve in the xy (sometimes called z) plane 
is mapped into a corresponding curve in the uv (= w) plane. In addition, the statements of 
the preceding paragraph can be extended to functions: 





|w@I = w’@)I] < lw@ £w'@l s wl + w'OL, 


Rew(:) = MOTION Imw(e) = WOOF (1.129) 


Complex Numbers of Unit Magnitude 


Complex numbers of the form 


e!? =cos@ + ising, (1.130) 


where we have given the variable the name @ to emphasize the fact that we plan to restrict 
it to real values, correspond on an Argand diagram to points for which x = cos6@, y = sing, 
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FIGURE 1.14 Some values of z on the unit circle. 


and whose magnitude is therefore cos” @ + sin? @ = 1. The points exp(i@) therefore lie on 
the unit circle, at polar angle 6. This observation makes obvious a number of relations 
that could in principle also be deduced from Eq. (1.130). For example, if @ has the special 
values 2/2, 2, or 37/2, we have the interesting relationships 


eft /2 = =-1, eit? — 7. (1.131) 
We also see that exp(i@) is periodic, with period 277, so 
elt = eit =.-=1, eit /2 = eit/2 =~-i, ete. (1.132) 


A few relevant values of z on the unit circle are illustrated in Fig. 1.14. These relation- 
ships cause the real part of exp(iwt) to describe oscillation at angular frequency w, with 
exp(i[wt + 6]) describing an oscillation displaced from that first mentioned by a phase 
difference 6. 


Circular and Hyperbolic Functions 


The relationship encapsulated in Eq. (1.130) enables us to obtain convenient formulas for 
the sine and cosine. Taking the sum and difference of exp(+i0) and exp(—i0), we have 


id -i0 id _ ,-i0 
mee jae (1.133) 
2 2i 


These formulas place the definitions of the hyperbolic functions in perspective: 


01 ,-0 6 _ ,-0 
ee fae (1.134) 
2 2 
Comparing these two sets of equations, it is possible to establish the formulas 
coshiz=cosz, sinhiz=isinz. (1.135) 


Proof is left to Exercise 1.8.5. 
The fact that exp(in@) can be written in the two equivalent forms 


cosné +isinné = (cosé6 + ising)” (1.136) 
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establishes a relationship known as de Moivre’s Theorem. By expanding the right mem- 
ber of Eq. (1.136), we easily obtain trigonometric multiple-angle formulas, of which the 
simplest examples are the well-known results 


sin(20) =2sin@cos@, cos(20) = cos” @ — sin? 0. 


If we solve the sin@ formula of Eq. (1.133) for exp(i@), we get (choosing the plus sign 


for the radical) 
e? =isind + 1—sin’9. 


Setting sin@ = z and 6 = sin !(z), and taking the logarithm of both sides of the above 
equation, we express the inverse trigonometric function in terms of logarithms. 


sin’! (z) = —iln [iz +V1—- 2] . 
The set of formulas that can be generated in this way includes: 
sin'(2)=—infiz+ ¥I—2], tan!) = ll In(1 — iz) —In(l +i2)], 


sinh~1(2) =In| z+ +2], tan!) = 5] In(l +2) ~~ 2). (1.137) 


Powers and Roots 


The polar form is very convenient for expressing powers and roots of complex numbers. 
For integer powers, the result is obvious and unique: 


z=re®, zn =rrelM?. 
For roots (fractional powers), we also have 


z=re? zi/n — pl/neig/n. 


’ 


but the result is not unique. If we write z in the alternate but equivalent form 


z= pier): 


where m is an integer, we now get additional values for the root: 
zl/n — pl/ngiyt2mm)/n (any integer m). 


If n = 2 (corresponding to the square root), different choices of m will lead to two distinct 
values of z!/2, both of the same modulus but differing in argument by z. This corresponds 
to the well-known result that the square root is double-valued and can be written with 
either sign. 

In general, z’/” is n-valued, with successive values having arguments that differ by 
2x/n. Figure 1.15 illustrates the multiple values of 1!/3, i!/3, and (—1)!/3. 


1/n 
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5(-1+v3/) 





FIGURE 1.15 Cube roots: (a) 1!/3; (b) i!/3; (c) (—1)!/?. 


Logarithm 


Another multivalued complex function is the logarithm, which in the polar representation 


takes the form 


Inz = In(re’?) = Inr + 10. 


However, it is also true that 


for any positive or negative integer n. Thus, Inz has, for a given z, the infinite number of 


Inz=In (pe) =Inr+i(0+2nz), 


values corresponding to all possible choices of n in Eq. (1.138). 


Exercises 


1.8.1 


1.8.2 


Find the reciprocal of x + iy, working in polar form but expressing the final result in 


Cartesian form. 


Show that complex numbers have square roots and that the square roots are contained 


in the complex plane. What are the square roots of i? 


Show that 
(a) cosn@ =cos" 6 — () cos’? @ sin? 6 + CG) cos”~*@ sing —.--, 


(b) sinnd= () cos’! @ sing — (;) cos’? 6 sin? 6 +--+. 








Prove that 
a= _ sin(Nx/2) eer: 
(a) a sinx/2 2’ 
—1 
=e x 
(b) Yio ae es 


These series occur in the analysis of the multiple-slit diffraction pattern. 





1.8.5 


1.8.6 


1.8.7 


1.8.8 


1.8.9 


1.8.10 


1.8.11 
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Assume that the trigonometric functions and the hyperbolic functions are defined for 
complex argument by the appropriate power series. Show that 
isinz = sinhiz, siniz =i sinhz, 


cos z= coshiz, cosiz =coshz. 


Using the identities 


established from comparison of power series, show that 
(a) sin(x +iy) =sinx cosh y +i cos x sinh y, 
cos(x + iy) =cosx cosh y —isinx sinh y, 
(b) |sinz|* = sin? x + sinh? y, | cosz|* = cos? x + sinh? y. 
This demonstrates that we may have | sinz|, | cos z| > 1 in the complex plane. 
From the identities in Exercises 1.8.5 and 1.8.6 show that 
(a) sinh(x +iy) =sinhxcos y +i coshx siny, 
cosh(x + iy) =coshx cos y + i sinhx sin y, 
(b) | sinhz|* = sinh’ x + sin? y, |coshz|*? = cosh? x + sin? y. 


Show that 


(a) tanh z_ sinhx +isiny (b) coth z __ sinhx —isiny 


2  coshx+cosy’ 2  coshx—cosy 











‘ li 
By comparing series expansions, show that tan! x = In = ‘ 
2 1+ix 


Find the Cartesian form for all values of 


(a) (—8)!/9, 
(by a, 
(c) ei t/4. 


Find the polar form for all values of 


(a) (+i), 
Gy. I. 
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1.9 DERIVATIVES AND EXTREMA 


We recall the familiar limit identified as the derivative, df (x)/dx, of a function f(x) ata 
point x: 


df) _) fe te)— FQ), 


dx e=0 é 





(1.139) 


the derivative is only defined if the limit exists and is independent of the direction from 
which « approaches zero. The variation or differential of f(x) associated with a change 
dx in its independent variable from the reference value x assumes the form 


df 
df =f +dx)— f(x)=— dx, (1.140) 


in the limit that dx is small enough that terms dependent on dx* and higher powers of dx 
become negligible. The mean value theorem (based on the continuity of f) tells us that 
here, df/dx is evaluated at some point € between x and x + dx, but asdx > 0,& > x. 

When a quantity of interest is a function of two or more independent variables, the 
generalization of Eq. (1.140) is (illustrating for the physically important three-variable 
case): 


df =| fx tax, y+dy,z+dz)— f(x,y +dy,2+d2)| 
+[(Uf@.y +dy,2+dz) - f(x.y,z+a2)] 
+[ fo.yz+d2)- fir,y,2)] 


0 0 a 
= Fax 4 fay + ui dz, (1.141) 
ax dy Oz 





where the partial derivatives indicate differentiation in which the independent variables 
not being differentiated are kept fixed. The fact that df/dx is evaluated at y + dy and 
z+ dz instead of at y and z alters the derivative by amounts that are of order dy and 
dz, and therefore the change becomes negligible in the limit of small variations. It is thus 
consistent to interpret Eq. (1.141) as involving partial derivatives that are all evaluated at 
the reference point x, y, z. 

Further analysis of the same sort as led to Eq. (1.141) can be used to define higher 
derivatives and to establish the useful result that cross derivatives (e.g., 07/dxdy) are 
independent of the order in which the differentiations are performed: 


a/df\ OF OF 

dy \ax) dyax ~ axdy’ 
Sometimes it is not clear from the context which variables other than that being dif- 

ferentiated are independent, and it is then advisable to attach subscripts to the derivative 


notation to avoid ambiguity. For example, if x, y, and z have been defined in a problem, 
but only two of them are independent, one might write 


(3), * Ge) 
ax }, eo. 





(1.142) 


whichever is actually meant. 
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For working with functions of several variables, we note two useful formulas that follow 
from Eq. (1.141): 


1. The chain rule, 
df of dx of dy df dz 
ds ax ds dy ds dz ds’ 
which applies when x, y, and z are functions of another variable, s, 


2. A formula obtained by setting df = 0 (here shown for the case where there are only 
two independent variables and the dz term of Eq. (1.141) is absent): 


(1.143) 








(2) — {ee 


Dx a (®) : (1.144) 
dy /, 


In Lagrangian mechanics, one occasionally encounters expressions such as’ 


d OL. aL, , ab 
oes x+ ’ 





ago ag 


an example of use of the chain rule. Here it is necessary to distinguish between the formal 
dependence of L on its three arguments and the overall dependence of L on time. Note the 
use of the ordinary (d/dt) and partial (0/dt) derivative notation. 


Stationary Points 


Whether or not a set of independent variables (e.g., x, y, z of our previous discussion) 
represents directions in space, one can ask how a function f changes if we move in various 
directions in the space of the independent variables; the answer is provided by Eq. (1.143), 
where the “direction” is defined by the values of dx/ds, dy/ds, etc. 

It is often desired to find the minimum of a function f of n variables xj, i = 1,...,n, 
and a necessary but not sufficient condition on its position is that 


df 


— =O for all directions of ds. 
ds 


This is equivalent to requiring 


Of _ 
Ox; ~ 


All points in the {x;} space that satisfy Eq. (1.145) are termed stationary; for a stationary 
point of f to be a minimum, it is also necessary that the second derivatives d? f/ds* be 
positive for all directions of s. Conversely, if the second derivatives in all directions are 
negative, the stationary point is a maximum. If neither of these conditions are satisfied, the 
stationary point is neither a maximum nor a minimum, and is often called a saddle point 
because of the appearance of the surface of f when there are two independent variables 


0, i=1,...,n. (1.145) 


7Here dots indicate time derivatives. 
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FiGURE 1.16 A stationary point that is neither a maximum nor minimum 
(a saddle point). 


(see Fig. 1.16). It is often obvious whether a stationary point is a minimum or maximum, 
but a complete discussion of the issue is nontrivial. 
Exercises 


1.9.1 Derive the following formula for the Maclaurin expansion of a function of two 
variables: 


a i) 
Foy) = 0,0) +00 + ye 
x dy 


[7 oor (aya F 2\ 207 f 
+3/ (0: st ()yae+ 2)” ay2 

tf (a\ 407. By 5 OF 3\ 9. 0° f a\. 40°F 
+3| (0): sa +())s Vagtay ” 0)" Spat 3)? a3 |” 


where all the partial derivatives are to be evaluated at the point (0, 0). 








1.9.2 The result in Exercise 1.9.1 can be generalized to larger numbers of independent vari- 
ables. Prove that for an m-variable system, the Maclaurin expansion can be written in 
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the symbolic form 


CO m 


n 9 n 
Co (Sa) FO, 50.50), 


n=0 i=1 


where in the right-hand side we have made the substitutions x; = a; f. 


1.10 EVALUATION OF INTEGRALS 


Proficiency in the evaluation of integrals involves a mixture of experience, skill in pat- 
tern recognition, and a few tricks. The most familiar include the technique of integration 
by parts, and the strategy of changing the variable of integration. We review here some 
methods for integrals in one and multiple dimensions. 


Integration by Parts 


The technique of integration by parts is part of every elementary calculus course, but its 
use is so frequent and ubiquitous that it bears inclusion here. It is based on the obvious 
relation, for u and v arbitrary functions of x, 


d(uv) =udv+vdu. 


Integrating both sides of this equation over an interval (a, b), we reach 


; b b 
= fudv+ f vdu, 
a 
a a 


which is usually rearranged to the well-known form 


uv 





b 


[udu 


a 





a 


b 
b 
- | vaw. (1.146) 
a 


Example 1.10.1 INTEGRATION BY PARTS 


Consider the integral / : x sinx dx. We identify u = x and dv = sinx dx. Differentiating 
and integrating, we find du = dx and v = —cosx, so Eq. (1.146) becomes 
b b 
de sinx dx = (x)(— cos.) _ Jc cosx)dx =acosa — bcosb+sinb — sina. 
a a 


The key to the effective use of this technique is to see how to partition an integrand into 
u and dv in a way that makes it easy to form du and v and also to integrate [ udu. 
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Special Functions 


A number of special functions have become important in physics because they arise in fre- 
quently encountered situations. Identifying a one-dimensional (1-D) integral as one yield- 
ing a special function is almost as good as a straight-out evaluation, in part because it 
prevents the waste of time that otherwise might be spent trying to carry out the integration. 
But of perhaps more importance, it connects the integral to the full body of knowledge 
regarding its properties and evaluation. It is not necessary for every physicist to know 
everything about all known special functions, but it is desirable to have an overview per- 
mitting the recognition of special functions which can then be studied in more detail if 
necessary. 

It is common for a special function to be defined in terms of an integral over the range 
for which that integral converges, but to have its definition extended to a larger domain 


Table 1.2 Special Functions of Importance in Physics 





CO 
Gamma function Tx)= / o—letat See Chap. 13. 
0 
CO 
Factorial (n integral) nl= / t"e ‘dt n!=T(n+ 1) 
CO 
1 lat 
Riemann zeta function ¢(x) = —— | —— See Chaps. | and 12. 
T(x) eb] 
0 
CO 
Exponential integrals En(x) = / t "edt E| (x) = —Ei(—x) 
1 
So . 
t 
Sine integral si(x) =— i — dt 
x 
CO 
ee : cost 
Cosine integral Ci(x) = — = dt 
x 
x 
; 2 _72 
Error functions erf(x) = — Je dt erf(co) = 1 
Wha 
0 
[oe 
f(x) we fear fo(x) = 1— erf(x) 
erfc(x) = —= J e erfc(x) = 1 — erf(x 
Jt 
x 





ee ets : f In(l —1) 
Dilogarithm Lig(x) = — / dt 
0 
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by analytic continuation in the complex plane (cf. Chapter 11) or by the establishment of 
suitable functional relations. We present in Table 1.2 only the most useful integral repre- 
sentations of a few functions of frequent occurrence. More detail is provided by a variety of 
on-line sources and in material listed under Additional Readings at the end of this chapter, 
particularly the compilations by Abramowitz and Stegun and by Gradshteyn and Ryzhik. 

A conspicuous omission from the list in Table 1.2 is the extensive family of Bessel func- 
tions. A short table cannot suffice to summarize their numerous integral representations; a 
survey of this topic is in Chapter 14. Other important functions in more than one variable 
or with indices in addition to arguments have also been omitted from the table. 


Other Methods 


An extremely powerful method for the evaluation of definite integrals is that of contour 
integration in the complex plane. This method is presented in Chapter 11 and will not be 
discussed here. 

Integrals can often be evaluated by methods that involve integration or differentiation 
with respect to parameters, thereby obtaining relations between known integrals and those 
whose values are being sought. 


Example 1.70.2 — DiFFERENTIATE PARAMETER 


We wish to evaluate the integral 


e* 
i= | spa 
0 


We introduce a parameter, f, to facilitate further manipulations, and consider the related 


integral 
oe eat (x?-+a”) 
J(t)= | —— ax; 
" | ef 
0 


we note that J = e J(1). 
We now differentiate J(t) with respect to ¢ and evaluate the resulting integral, which is 
a scaled version of Eq. (1.148): 





t wy) 
. = — fH ds = -i# fr dx=—5 = et (1.147) 
0 0 


To recover J(t) we integrate Eq. (1.147) between t and oo, making use of the fact that 
J (00) = 0. To carry out the integration it is convenient to make the substitution u? = ar, 
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so we get 





won [Se 7 a ae —w du, 


at'/2 


which we now recognize as J(t) = (2 /2a)erfe(at!/*). Thus, our final result is 
I= = ov erfc(a). 
2a 


Many integrals can be evaluated by first converting them into infinite series, then 
manipulating the resulting series, and finally either evaluating the series or recognizing 
it as a special function. 


Example 1.10.3 Expanb, THEN INTEGRATE 


Consider J = i ax In ( Ht), Using Eq. (1.120) for the logarithm, 


1 
r= fax2}i+5 a ooh a ek 
7 5 = q2 52 
0 


Noting that 
1 1 
i= ge 
we see that 
{)-TtOaltatates 
32 52 
so 1 = 3 (2). | 


Simply using complex numbers aids in the evaluation of some integrals. Take, for 
example, the elementary integral 
Ix | dx 
fl 4+ x2? 


a 





Making a partial fraction decomposition of (1 + x and integrating, we easily get 


r= [5 a ae dx = *J ina —ix) In(1 + ix) 
| Glide few ol ee ix). 


From Eq. (1.137), we recognize this as tan~! (x). 
The complex exponential forms of the trigonometric functions provide interesting 
approaches to the evaluation of certain integrals. Here is an example. 
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Example 1.10.4 — A TRIGONOMETRIC INTEGRAL 


Co 
Consider r= fe cos bt dt, 
0 


where a and b are real and positive. Because cos bt = Re e'”’, we note that 


[o,@) 
ra me f oo" at, 
0 


The integral is now just that of an exponential, and is easily evaluated, leading to 


1 a+ib 
=—fK : 
a—ib  Giaaye 





T=Re 


which yields J = a/(a? +b”). As a bonus, the imaginary part of the same integral gives us 


CO 

—at .: = 
fe SE AO pe 
0 


Recursive methods are often useful in obtaining formulas for a set of related integrals. 


Example 1.10.5 — RecuRSION 


Consider 
1 
In =[« sin zt dt 
0 


for positive integer n. 
Integrating J, by parts twice, taking u = ¢” and dv = sinzt dt, we have 


1 n(n—1) 
In =o. — aoe ~ n—2; 
cd sa 
with starting values Jp = 2/m and I; = 1/z. 
There is often no practical need to obtain a general, nonrecursive formula, as repeated 
application of the recursion is frequently more efficient that a closed formula, even when 
one can be found. | 
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Multiple Integrals 


An expression that corresponds to integration in two variables, say x and y, may be written 
with two integral signs, as in 


x2 y2(x) 
JJ fe. »ydxay or fu / dy f(x,y), 
x1 yi (x) 


where the right-hand form can be more specific as to the integration limits, and also gives 
an explicit indication that the y integration is to be performed first, or with a single integral 
sign, as in 


[femaa, 
S 


where S (if explicitly shown) is a 2-D integration region and dA is an element of “area” 
(in Cartesian coordinates, equal to dxdy). In this form we are leaving open both the choice 
of coordinate system to be used for evaluating the integral, and the order in which the 
variables are to be integrated. In three dimensions, we may either use three integral signs 
or a single integral with a symbol dt indicating a 3-D “volume” element in an unspecified 
coordinate system. 

In addition to the techniques available for integration in a single variable, multiple in- 
tegrals provide further opportunities for evaluation based on changes in the order of inte- 
gration and in the coordinate system used in the integral. Sometimes simply reversing the 
order of integration may be helpful. If, before the reversal, the range of the inner integral 
depends on the outer integration variable, care must be exercised in determining the inte- 
gration ranges after reversal. It may be helpful to draw a diagram identifying the range of 
integration. 


Example 7.10.6 — REVERSING INTEGRATION ORDER 


Consider 
CO [oe] 
A e > 
fevar [ Sas, 
S 
0 r 


in which the inner integral can be identified as an exponential integral, suggesting diffi- 
culty if the integration is approached in a straightforward manner. Suppose we proceed by 
reversing the order of integration. To identify the proper coordinate ranges, we draw on 
a (r,s) plane, as in Fig. 1.17, the region s > r > 0, which is covered in the original inte- 
gration order as a succession of vertical strips, for each r extending from s =r to s = oo. 
See the left-hand panel of the figure. If the outer integration is changed from r to s, this 
same region is covered by taking, for each s, a horizontal range of r that runs from r = 0 to 
r =s. See the right-hand panel of the figure. The transformed double integral then assumes 
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FIGURE 1.17 2-D integration region for Example 1.10.6. Left panel: inner integration 
over s; right panel: inner integration over r. 


the form 





CO = Ss 

f° as [ ear, 
Ss 

0 


0 


where the inner integral over r is now elementary, evaluating to 1 — e~*. This leaves us 
with a 1-D integral, 


(oe) 


i = (1—e*)ds. 
s 


0 





Introducing a power series expansion for 1 — e~*, this integral becomes 





fore) fore) 

es asd (—1)"—!5" asd (-1)"7! eo oo (-1)"7! 
ae ge 
0 n=1 n=1 0 n=1 


where in the last step we have identified the s integral (cf. Table 1.2) as (n — 1)!. We 
complete the evaluation by noting that (n — 1)!/n! = 1/n, so that the summation can be 
recognized as In2, thereby giving the final result 


CO [o.@) 
e 
pera f ds = 1n2. 
s 
0 


r 





A significant change in the form of 2-D or 3-D integrals can sometimes be accomplished 
by changing between Cartesian and polar coordinate systems. 
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Example 1.10.7 EVALUATION IN POLAR COORDINATES 


In many calculus texts, the evaluation of de exp(—x7)dx is carried out by first converting 
it into a 2-D integral by taking its square, which is then written and evaluated in polar 
coordinates. Using the fact that dxdy = r drdg, we have 


ioe) ioe) m/2 ioe) ioe) 

fave faye = fae frdreP =e f pane =F. 

0 0 0 0 0 

This yields the well-known result 
[o,@) 
fora =1Vq (1.148) 
0 
| 


Example 1.10.8 = ATOMIC INTERACTION INTEGRAL 


For study of the interaction of a small atom with an electromagnetic field, one of the in- 
tegrals that arises in a simple approximate treatment using Gaussian-type orbitals is (in 
dimensionless Cartesian coordinates) 


2 —(x24y2422 
i= [tape or. 

where the range of the integration is the entire 3-D physical space (JR*). Of course, this is 
a problem better addressed in spherical polar coordinates (r, 0, ¢), where r is the distance 
from the origin of the coordinate system, @ is the polar angle (for the Earth, known as 
colatitude), and y is the azimuthal angle (Jongitude). The relevant conversion formulas 
are: x? + y? + 27 =r? and z/r =cos6. The volume element is dt = r7 sin drdédg, 
and the ranges of the new coordinates are 0 <r <00,0<0 <a, and0<g < 2z. Inthe 
spherical coordinates, our integral becomes 


oo 4 20 
cos*@ _,2 a2 ee 
I= | dt——e™’ =] drre dé cos’ @sin@ | dg 
r 
0 0 0 


-(3)@)()=3 


Remarks: Changes of Integration Variables 


In a 1-D integration, a change in the integration variable from, say, x to y = y(x) involves 
two adjustments: (1) the differential dx must be replaced by (dx/dy)dy, and (2) the in- 
tegration limits must be changed from x1, x2 to y(x1), y(x2). If y(x) is not single-valued 
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over the entire range (x1, x2), the process becomes more complicated and we do not con- 
sider it further at this point. 

For multiple integrals, the situation is considerably more complicated and demands 
some discussion. Illustrating for a double integral, initially in variables x, y, but trans- 
formed to an integration in variables u, v, the differential dx dy must be transformed to 
J dudv, where J, called the Jacobian of the transformation and sometimes symbolically 
represented as 

_ d(x, y) 
~ (u,v) 


may depend on the variables. For example, the conversion from 2-D Cartesian coordinates 
x, y to plane polar coordinates r, 0 involves the Jacobian 


Ey, 
~ a(r,8) 


For some coordinate transformations the Jacobian is simple and of a well-known form, as 
in the foregoing example. We can confirm the value assigned to J by noticing that the 
area (in xy space) enclosed by boundaries at r, r+ dr, 0, and 0 + dé is an infinitesimally 
distorted rectangle with two sides of length dr and two of length rd@. See Fig. 1.18. For 
other transformations we may need general methods for obtaining Jacobians. Computation 
of Jacobians will be treated in detail in Section 4.4. 

Of interest here is the determination of the transformed region of integration. In prin- 
ciple this issue is straightforward, but all too frequently one encounters situations (both 
in other texts and in research articles) where misleading and potentially incorrect argu- 
ments are presented. The confusion normally arises in cases for which at least a part of 
the boundary is at infinity. We illustrate with the conversion from 2-D Cartesian to plane 
polar coordinates. Figure 1.19 shows that if one integrates for 0 < 6 < 27 andO <r <a, 
there are regions in the corners of a square (of side 2a) that are not included. If the integral 
is to be evaluated in the limit a — on, it is both incorrect and meaningless to advance ar- 
guments about the “neglect” of contributions from these corner regions, as every point in 
these corners is ultimately included as a is increased. 

A similar, but slightly less obvious situation arises if we transform an integration over 
Cartesian coordinates 0 < x < co, 0 < y < ~, into one involving coordinates u = x + y, 








so dxdy=rdrdoé. 








FiGURE 1.18 Element of area in plane polar coordinates. 
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FIGURE 1.19 2-D integration, Cartesian and plane polar coordinates. 

















FIGURE 1.20 Integral in transformed coordinates. 


v = y, with integration limits 0 < u < 00, O< v < u. See Fig. 1.20. Again it is incorrect 
and meaningless to make arguments justifying the “neglect” of the outer triangle (labeled 
B in the figure). The relevant observation here is that ultimately, as the value of wu is 
increased, any arbitrary point in the quarter-plane becomes included in the region being 
integrated. 





Exercises 

1.10.1 Use a recursive method to show that, for all positive integers n, '(n) = (n — 1)!. 

Evaluate the integrals in Exercises 1.10.2 through 1.10.9. 

[o,@) 
sin x 
1.10.2 / dx. 
x 

0 

Hint. Multiply integrand by e~™ and take the limit a > 0. 
1.10.3 





CO 

/ dx 
coshx 

0 


Hint. Expand the denominator is a way that converges for all relevant x. 





1.10.4 


1.10.5 


1.10.6 


1.10.7 


1.10.8 


1.10.9 
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Ce 
dx 
/ - , fora>0. 
eax + 1 
0 
CO 


sin. x d 
—— dx. 
x2 


Uw 


oo rt 
e~* sinx 
/ Se 
x 
0 
x 


; erf(t) dt. 


0 


The result can be expressed in terms of special functions in Table 1.2. 


[awa 
1 


Obtain a result in which the only special function is E}. 


CO 


=a 
/ = dx. 
x+1 


0 








x 


[o.@) 
tan! x : 
1.10.10 Show that dx =m 1n2. 
0 


1.10.11 


Hint. Integrate by parts, to linearize in tan~!. Then replace tan~! x by tan~! ax and 
evaluate fora = 1. 


By direct integration in Cartesian coordinates, find the area of the ellipse defined by 
x2 y? 
at pe 


1.10.12 A unit circle is divided into two pieces by a straight line whose distance of closest 


1.11 


approach to the center is 1/2 unit. By evaluating a suitable integral, find the area of 
the smaller piece thereby produced. Then use simple geometric considerations to verify 
your answer. 


DIRAC DELTA FUNCTION 


Frequently we are faced with the problem of describing a quantity that is zero everywhere 
except at a single point, while at that point it is infinite in such a way that its integral over 
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any interval containing that point has a finite value. For this purpose it is useful to introduce 
the Dirac delta function, which is defined to have the properties 


6(x)=0, x0, (1.149) 
b 
fo = f Feysax, (1.150) 


where f(x) is any well-behaved function and the integration includes the origin. As a 
special case of Eq. (1.150), 


CO 


/ d(x)dx =1. (1.151) 


—0o 


From Eq. (1.150), 6(x) must be an infinitely high, thin spike at x = 0, as in the description 
of an impulsive force or the charge density for a point charge. The problem is that no such 
function exists, in the usual sense of function. However, the crucial property in Eq. (1.150) 
can be developed rigorously as the limit of a sequence of functions, a distribution. For 
example, the delta function may be approximated by any of the sequences of functions, 
Eqs. (1.152) to (1.155) and Figs. 1.21 and 1.22: 


0, ee 


2n 
Sn(X)= An, —y<x< (1.152) 
0, x> x 
uw 2.2. 
bn (XxX) = Fe exp(—n°x*), (1.153) 














FiGURE 1.21 6-Sequence function: left, Eq. (1.152); right, Eq. (1.153). 
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FIGURE 1.22 65-Sequence function: left, Eq. (1.154); right, Eq. (1.155). 





n 1 
bn (x) = = 1+ n2x2’ (1.154) 
n 
sinnx 1 ixt 
bn(x) = =— | edt. (1.155) 
UX 20 
—n 


While all these sequences (and others) cause 6(x) to have the same properties, they dif- 
fer somewhat in ease of use for various purposes. Equation (1.152) is useful in providing 
a simple derivation of the integral property, Eq. (1.150). Equation (1.153) is convenient to 
differentiate. Its derivatives lead to the Hermite polynomials. Equation (1.155) is particu- 
larly useful in Fourier analysis and in applications to quantum mechanics. In the theory of 
Fourier series, Eq. (1.155) often appears (modified) as the Dirichlet kernel: 


. 1 
ia ee (1.156) 
2x sin( 5x) 

In using these approximations in Eq. (1.150) and elsewhere, we assume that f(x) is well 
behaved—that it offers no problems at large x. 

The forms for 6,(x) given in Eqs. (1.152) to (1.155) all obviously peak strongly for 
large n at x = 0. They must also be scaled in agreement with Eq. (1.151). For the forms 
in Eqs. (1.152) and (1.154), verification of the scale is the topic of Exercises 1.11.1 and 
1.11.2. To check the scales of Eqs. (1.153) and (1.155), we need values of the integrals 


CO [o.@) . 
12.2 a sinnx 
e"* dx=,/— and dx=u. 
n Xx 
—C 


—co 





These results are respectively trivial extensions of Eqs. (1.148) and (11.107) (the latter of 
which we derive later). 
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For most physical purposes the forms describing delta functions are quite adequate. 
However, from a mathematical point of view the situation is still unsatisfactory. The limits 


lim 8n(x) 
n—>oo 


do not exist. 

A way out of this difficulty is provided by the theory of distributions. Recognizing that 
Eq. (1.150) is the fundamental property, we focus our attention on it rather than on 6(x) 
itself. Equations (1.152) to (1.155) with n = 1, 2,3... may be interpreted as sequences of 
normalized functions, and we may consistently write 


[rcorears jim f noosooar. (1.157) 


Thus, 6(x) is labeled a distribution (not a function) and is regarded as defined by 
Eq. (1.157). We might emphasize that the integral on the left-hand side of Eq. (1.157) 
is not a Riemann integral.® 


Properties of 5(x) 


e From any of Eqs. (1.152) through (1.155) we see that Dirac’s delta function must be 
even in x, 6(—x) = 6(x). 


e Ifa>0, 
1 
d(ax) = —d(x), a>O. (1.158) 
a 


Equation (1.158) can be proved by making the substitution x = y/a: 
r 1 r 1 
i f(x) (ax) dx = a / f(y/a)o(y) dy = _ f (0). 
—oo —oo 


If a <0, Eq. (1.158) becomes 5(ax) = 5(x)/|al. 
e Shift of origin: 


co 


[oe -20 00 ax = Foo), (1.159) 


—0o 
which can be proved by making the substitution y = x — xo and noting that when y = 0, 


xX =xX0. 


8it can be treated as a Stieltjes integral if desired; 5(x) dx is replaced by du(x), where u(x) is the Heaviside step function 
(compare Exercise 1.11.9). 
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e Ifthe argument of 5(x) is a function g(x) with simple zeros at points a; on the real axis 
(and therefore g’(a;) #0), 


8x — ai) 
6 = —_—_.. 1.160 
(s) dX I’ @i) ee 
To prove Eq. (1.160), we write 
(ee) aj+e 
J feoseax= f r098( ans’) ax, 


where we have decomposed the original integral into a sum of integrals over small in- 
tervals containing the zeros of g(x). In these intervals, we replaced g(x) by the leading 
term in its Taylor series. Applying Eqs. (1.158) and (1.159) to each term of the sum, 
we confirm Eq. (1.160). 


e Derivative of delta function: 
lo) CO 
/ f (x)8' (x — x9) dx = — / f' (x)6(x — x9) dx = — f' (x0). (1.161) 
—~0o —oo 


Equation (1.161) can be taken as defining the derivative 5’ (x); it is evaluated by per- 
forming an integration by parts on any of the sequences defining the delta function. 


e In three dimensions, the delta function 5(r) is interpreted as 5(x)5(y)6(z), so it de- 
scribes a function localized at the origin and with unit integrated weight, irrespective 
of the coordinate system in use. Thus, in spherical polar coordinates, 


ff f (t2)5(r2 — r1)rZdrz sin O2d0.doy = f (11). (1.162) 
e Equation (1.155) corresponds in the limit to 

CO 

‘aa | exp (iw(t — x)) de, (1.163) 
2 

—0o 

with the understanding that this has meaning only when under an integral sign. In that 
context it is extremely useful for the simplification of Fourier integrals (Chapter 20). 


e Expansions of 5(x) are addressed in Chapter 5. See Example 5.1.7. 


Kronecker Delta 


It is sometimes useful to have a symbol that is the discrete analog of the Dirac delta func- 
tion, with the property that it is unity when the discrete variable has a certain value, and 
zero otherwise. A quantity with these properties is known as the Kronecker delta, defined 
for indices i and j as 
l, t=, 
sii = at (1.164) 
, 0, ifj. 
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Frequent uses of this symbol are to select a special term from a summation, or to have one 
functional form for all nonzero values of an index, but a different form when the index is 
zero. Examples: 


Exercises 


1.11.1 


1.11.2 


1.11.3 


1.11.4 


1.11.5 


1 20 
5 iis G —— oe 
» fij ij ” Sii n ree ‘a 


Let 
1 
0, i= 5 
bn(x) = 7 7, ae ee 
2" 2n 
0, an 
Show that 


(jim, | F(x)bn(x) dx = f(0), 


assuming that f(x) is continuous at x = 0. 


For 
n 1 
bn (x) = ete 
show that 
Co 
/ bn(x) dx = 1. 
—oo 


Fejer’s method of summing series is associated with the function 
1 [sin(nt/2)]? 

S(t =o — | || + 

27n | sin(t/2) 

Show that 6,,(t) is a delta distribution, in the sense that 


re ie [snow 


im : 
n—>oo 27tn sin(t /2) 
—oo 


2 
dt = f(0). 
Prove that 


1 
a aa cae 


Note. If 5[a(x — x1)] is considered even, relative to x;, the relation holds for negative 
a and 1/a may be replaced by 1/ |a|. 
Show that 
d[(x — x1) (% — X2)] = [8% — x1) + 8% — x2)]/ [x1 — 221 - 
Hint. Try using Exercise 1.11.4. 





1.11 Dirac Delta Function 81 


n large 
je 8 





nsmall 











FiGURE 1.23 Heaviside unit step function. 


. n a p2y2 
1.11.6 | Using the Gauss error curve delta sequence 6, = —=e " ~ , show that 


J 
25) = —<d(x), 


treating 5(x) and its derivative as in Eq. (1.157). 
1.11.7 Show that 


CO 
[ S@remax=-r'o. 
—0o 
Here we assume that f’(x) is continuous at x = 0. 


1.11.8 Prove that 
-1 


df (x) 
dx 





5(x = Xo), 


X=X0 





sre =| 


where xo is chosen so that f (xo) = 0. 
Hint. Note that 6(f) df = 6(x) dx. 
1.11.9 (a) If we define a sequence 6, (x) =n/(2 cosh? nx), show that 


00 
/ bn(x)dx =1, independent of n. 
—00 
(b) Continuing this analysis, show that? 
x 
i: On(x) dx = ; [1 + tanhnx] = uy(x) 
—00 
and 
0, x <0, 
ee = 1, x>0. 
This is the Heaviside unit step function (Fig. 1.23). 


9° Many other symbols are used for this function. This is the AMS-55 notation (in Additional Readings, see Abramowitz and 
Stegun): u for unit. 
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Additional Readings 


Abramowitz, M., and I. A. Stegun, eds., Handbook of Mathematical Functions with Formulas, Graphs, and 
Mathematical Tables (AMS-55). Washington, DC: National Bureau of Standards (1972), reprinted, Dover 
(1974). Contains a wealth of information about a large number of special functions. 

Bender, C. M., and S. Orszag, Advanced Mathematical Methods for Scientists and Engineers. New York: 
McGraw-Hill (1978). Particularly recommended for methods of accelerating convergence. 

Byron, F. W., Jr., and R. W. Fuller, Mathematics of Classical and Quantum Physics. Reading, MA: Addison- 
Wesley (1969), reprinted, Dover (1992). This is an advanced text that presupposes moderate knowledge of 
mathematical physics. 

Courant, R., and D. Hilbert, Methods of Mathematical Physics, Vol. | (1st English ed.). New York: Wiley 
(Interscience) (1953). As a reference book for mathematical physics, it is particularly valuable for existence 
theorems and discussion of areas such as eigenvalue problems, integral equations, and calculus of variations. 

Galambos, J., Representations of Real Numbers by Infinite Series. Berlin: Springer (1976). 


Gradshteyn, I. S., and I. M. Ryzhik, Table of Integrals, Series, and Products. Corrected and enlarged 7th ed., 
edited by A. Jeffrey and D. Zwillinger. New York: Academic Press (2007). 

Hansen, E., A Table of Series and Products. Englewood Cliffs, NJ: Prentice-Hall (1975). A tremendous compi- 
lation of series and products. 

Hardy, G. H., Divergent Series. Oxford: Clarendon Press (1956), 2nd ed., Chelsea (1992). The standard, com- 
prehensive work on methods of treating divergent series. Hardy includes instructive accounts of the gradual 
development of the concepts of convergence and divergence. 

Jeffrey, A., Handbook of Mathematical Formulas and Integrals. San Diego: Academic Press (1995). 

Jeffreys, H. S., and B. S. Jeffreys, Methods of Mathematical Physics, 3rd ed. Cambridge, UK: Cambridge Univer- 
sity Press (1972). This is a scholarly treatment of a wide range of mathematical analysis, in which considerable 
attention is paid to mathematical rigor. Applications are to classical physics and geophysics. 

Knopp, K., Theory and Application of Infinite Series. London: Blackie and Son, 2nd ed. New York: Hafner 
(1971), reprinted A. K. Peters Classics (1997). This is a thorough, comprehensive, and authoritative work that 
covers infinite series and products. Proofs of almost all the statements about series not proved in this chapter 
will be found in this book. 

Mangulis, V., Handbook of Series for Scientists and Engineers. New York: Academic Press (1965). A most 
convenient and useful collection of series. Includes algebraic functions, Fourier series, and series of the special 
functions: Bessel, Legendre, and others. 

Morse, P. M., and H. Feshbach, Methods of Theoretical Physics, 2 vols. New York: McGraw-Hill (1953). This 
work presents the mathematics of much of theoretical physics in detail but at a rather advanced level. It is 
recommended as the outstanding source of information for supplementary reading and advanced study. 

Rainville, E. D., Infinite Series. New York: Macmillan (1967). A readable and useful account of series constants 
and functions. 

Sokolnikoff, I. S., and R. M. Redheffer, Mathematics of Physics and Modern Engineering, 2nd ed. New York: 
McGraw-Hill (1966). A long chapter 2 (101 pages) presents infinite series in a thorough but very read- 
able form. Extensions to the solutions of differential equations, to complex series, and to Fourier series are 
included. 

Spiegel, M. R., Complex Variables, in Schaum’s Outline Series. New York: McGraw-Hill (1964, reprinted 1995). 
Clear, to the point, and with very large numbers of examples, many solved step by step. Answers are provided 
for all others. Highly recommended. 

Whittaker, E. T., and G. N. Watson, A Course of Modern Analysis, 4th ed. Cambridge, UK: Cambridge University 


Press (1962), paperback. Although this is the oldest of the general references (original edition 1902), it still is 
the classic reference. It leans strongly towards pure mathematics, as of 1902, with full mathematical rigor. 


CHAPTER 2 


DETERMINANTS AND 
MATRICES 


2.1 DETERMINANTS 


We begin the study of matrices by solving linear equations that will lead us to determi1- 
nants and matrices. The concept of determinant and the notation were introduced by the 
renowned German mathematician and philosopher Gottfried Wilhelm von Leibniz. 


Homogeneous Linear Equations 


One of the major applications of determinants is in the establishment of a condition for 
the existence of a nontrivial solution for a set of linear homogeneous algebraic equations. 
Suppose we have three unknowns x1, x2, x3 (or m equations with n unknowns): 


ax, +aox2 +.43x3 = 0, 
bix1 + box2 + b3x3 = 0, (2.1) 
CyXy + C2X2 +.63x3 = 0. 


The problem is to determine under what conditions there is any solution, apart from 
the trivial one x; = 0, x2 = 0, x3 = 0. If we use vector notation x = (x1, x2,.x3) for the 
solution and three rows a= (a), a2, 43), b = (by, bo, b3), ¢ = (C1, C2, c3) of coefficients, 
then the three equations, Eqs. (2.1), become 


a-x=0, b-x=0, c-x=0. (2.2) 


These three vector equations have the geometrical interpretation that x is orthogonal 
to a,b, and c. If the volume spanned by a, b, ¢ given by the determinant (or triple scalar 
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product, see Eq. (3.12) of Section 3.2) 


aq, @ @ 
D3 = (ax b)-c=det(a,b,c)=|b} bo 3 (2.3) 
C1) C2 C3 


is not zero, then there is only the trivial solution x = 0. For an introduction to the cross 
product of vectors, see Chapter 3: Vector Analysis, Section 3.2: Vectors in 3-D Space. 

Conversely, if the aforementioned determinant of coefficients vanishes, then one of 
the row vectors is a linear combination of the other two. Let us assume that ¢ lies in the 
plane spanned by a and b, that is, that the third equation is a linear combination of the 
first two and not independent. Then x is orthogonal to that plane so that x ~ a x b. Since 
homogeneous equations can be multiplied by arbitrary numbers, only ratios of the x; are 
relevant, for which we then obtain ratios of 2 x 2 determinants 

X1  anb3—a3br2_— x2 ayb3 — a3b, 


= ; = (2.4) 
x3 ajby—anb, X3 a,bz — arb; 





from the components of the cross product a x b, provided x3 ~ a,b2 — apb, #0. This is 
Cramer’s rule for three homogeneous linear equations. 


Inhomogeneous Linear Equations 


The simplest case of two equations with two unknowns, 
aix1 +42x2=a3, bx) + box2 =bs, (2.5) 


can be reduced to the previous case by imbedding it in three-dimensional (3-D) space with 
a solution vector x = (x1,.x2, —1) and row vectors a = (dj, a2, 43), b = (b1, bo, b3). AS 
before, Eqs. (2.5) in vector notation, a- x = 0 and b- x = 0, imply that x ~ a x b, so the 
analog of Eq. (2.4) holds. For this to apply, though, the third component of a x b must not 
be zero, that is, a;bz — azb, 4 0, because the third component of x is —1 4 0. This yields 
the x; as 








aa a 
bo —b b3 b 
ee eee, (2.6) 
ajby—anb,  |a, az 
by bo 
a @ 
a,b3 — a3b by b3 
ajby—anb,  |a, a 
by bo 








The determinant in the numerator of x1 (x2) is obtained from the determinant of the coef- 


ficients (71 a3 
cients bi. be bis of the 


inhomogeneous side of Eq. (2.5). This is Cramer’s rule for a set of two inhomogeneous 
linear equations with two unknowns. 

A full understanding of the above exposition requires now that we introduce a formal 
definition of the determinant and show how it relates to the foregoing. 


by replacing the first (second) column vector by the vector ( 
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Definitions 
Before defining a determinant, we need to introduce some related concepts and definitions. 


e When we write two-dimensional (2-D) arrays of items, we identify the item in the nth 
horizontal row and the mth vertical column by the index set n,m; note that the row 
index is conventionally written first. 


e Starting from a set of n objects in some reference order (e.g., the number sequence 
1, 2, 3, ..., 2), we can make a permutation of them to some other order; the total 
number of distinct permutations that are possible is n! (choose the first object n ways, 
then choose the second in n — 1 ways, etc.). 


e Every permutation of n objects can be reached from the reference order by a succession 
of pairwise interchanges (e.g., 1234 — 4132 can be reached by the successive steps 
1234 — 1432 — 4132). Although the number of pairwise interchanges needed for a 
given permutation depends on the path (compare the above example with 1234 > 
1243 > 1423 — 4123 — 4132), for a given permutation the number of interchanges 
will always either be even or odd. Thus a permutation can be identified as having either 
even or odd parity. 





e It is convenient to introduce the Levi-Civita symbol, which for an n-object system is 
denoted by ¢;;..., where ¢ has n subscripts, each of which identifies one of the objects. 
This Levi-Civita symbol is defined to be +1 if ij... represents an even permutation 
of the objects from a reference order; it is defined to be —1 if ij... represents an odd 
permutation of the objects, and zero if ij... does not represent a permutation of the 
objects (i.e., contains an entry duplication). Since this is an important definition, we set 
it out in a display format: 


&ij..=+1, ij... an even permutation, 
=-1, ij... anodd permutation, 
= 0, ij... nota permutation. (2.8) 
We now define a determinant of order n to be ann x n square array of numbers (or func- 


tions), with the array conventionally written within vertical bars (not parentheses, braces, 
or any other type of brackets), as follows: 


a. a2... Ain 
a2, 422,» A 

Dy =\031 G32... Bn}. (2.9) 
Gni G@n2 «++ Ann 


The determinant D, has a value that is obtained by 


1. Forming all n! products that can be formed by choosing one entry from each row in 
such a way that one entry comes from each column, 

2. Assigning each product a sign that corresponds to the parity of the sequence in which 
the columns were used (assuming the rows were used in an ascending sequence), 

3. Adding (with the assigned signs) the products. 
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More formally, the determinant in Eq. (2.9) is defined to have the value 


D, = So eij...d1142, se (2.10) 
ij 


The summations in Eq. (2.10) need not be restricted to permutations, but can be assumed 
to range independently from 1 through n; the presence of the Levi-Civita symbol will 
cause only the index combinations corresponding to permutations to actually contribute to 
the sum. 


Example 2.1.1 DETERMINANTS OF ORDERS 2 AND 3 


To make the definition more concrete, we illustrate first with a determinant of order 2. The 
Levi-Civita symbols needed for this determinant are ¢;2 = +1 and €2; = —1 (note that 
E11 = €22 = 0), leading to 


a1 412 


D2 = 
421 422 


= €12411422 + €21412421 = 411422 — aj2a21. 








We see that this determinant expands into 2! = 2 terms. A specific example of a determi- 
nant of order 2 is 








a, a2 
= ajb2 — bia. 
by Be 152 — 51 a2 
Determinants of order 3 expand into 3! = 6 terms. The relevant Levi-Civita symbols 
are €123 = €231 = €312 = +1, £213 = €321 = £132 = —1; all other index combinations have 
Eijk =0, so 


a1 42 413 
D3=Ja21 422 423) = > Eijk ALi G2 j A3k 
431 432 433 ijk 


= 41422033 — 411423432 — 413422431 — 412021433 + 412473031 + 413421432. 
The expression in Eq. (2.3) is the determinant of order 3 


a4, a2 43 
by bz b3) =aybyc3 — ay b3c2 — agbic3 + agb3c, + a3b1c2 — agboc}. 
C1 C2. 3 


Note that half of the terms in the expansion of a determinant bear negative signs. It is 
quite possible that a determinant of large elements will have a very small value. Here is 
one example: 


8 11 7 
9 Il SJ=1. 
8 12 9 
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Properties of Determinants 


The symmetry properties of the Levi-Civita symbol translate into a number of symme- 
tries exhibited by determinants. For simplicity, we illustrate with determinants of order 3. 
The interchange of two columns of a determinant causes the Levi-Civita symbol multi- 
plying each term of the expansion to change sign; the same is true if two rows are inter- 
changed. Moreover, the roles of rows and columns may be interchanged; if a determinant 
with elements aj; is replaced by one with elements bj; = aj;, we call the bj; determi- 
nant the transpose of the a;; determinant. Both these determinants have the same value. 
Summarizing: 


Interchanging two rows (or two columns) changes the sign of the value of a determi- 
nant. Transposition does not alter its value. 


Thus, 
a} a2 443 a2 ay a3) jay a2 431 
a2) 422, 423}; =—]|a22 a2) 23) =Ja12 422-32]. (2.11) 
431 432 433 432. 431-433] 413. 23-33 


Further consequences of the definition in Eq. (2.10) are: 
(1) Multiplication of all members of a single column (or a single row) by a constant k 
causes the value of the determinant to be multiplied by k, 


(2) If the elements of a column (or row) are actually sums of two quantities, the deter- 
minant can be decomposed into a sum of two determinants. 


Thus, 


a1 
k a21 


a3) 


a)2 
a22 
432 


ayit+bh 
a2} + b2 
a3, + b3 


a3 


kay 


473) = |kaz1 


433 


a2 
a22 
a32 


ka31 


a3 
a23 
33 


a12 
a22 
432 


a\1 
a21 
431 


13 
a3 
433 


a2 
a22 
32 


13 
a23 
433 





ka 
a2 
a3 


+ 


11 


1 
1 


by 


bo 
b3 





These basic properties and/or the basic definition mean that 


kay2 
a22 
a32 


a12 
a22 
32 


kay3 
a3 |, 
433 


(2.12) 


13 
a3}. 
33 


(2.13) 


e Any determinant with two rows equal, or two columns equal, has the value zero. To 
prove this, interchange the two identical rows or columns; the determinant both remains 
the same and changes sign, and therefore must have the value zero. 


e Anextension of the above is that if two rows (or columns) are proportional, the deter- 


minant is zero. 


e The value of a determinant is unchanged if a multiple of one row is added (column 
by column) to another row or if a multiple of one column is added (row by row) to 
another column. Applying Eq. (2.13), the addition does not contribute to the value of 


the determinant. 


e Ifeach element in a row or each element in a column is zero, the determinant has the 


value zero. 
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Laplacian Development by Minors 


The fact that a determinant of order n expands into n! terms means that it is important to 
identify efficient means for determinant evaluation. One approach is to expand in terms of 
minors. The minor corresponding to a;;, denoted M;;, or Mj;(a) if we need to identify M 
as coming from the a;;, is the determinant (of order n — 1) produced by striking out row i 
and column j of the original determinant. When we expand into minors, the quantities to 
be used are the cofactors of the (ij) elements, defined as (—1)'*/ Mj;. The expansion can 
be made for any row or column of the original determinant. If, for example, we expand the 
determinant of Eq. (2.9) using row i, we have 


n 
Dy — aij (—1L'*) M;;. (2.14) 
j=l 


This expansion reduces the work involved in evaluation if the row or column selected for 
the expansion contains zeros, as the corresponding minors need not be evaluated. 


Example 2.1.2 EXPANSION IN MINORS 


Consider the determinant (arising in Dirac’s relativistic electron theory) 


a1 412 413° a44 0 1 0 0 
pal|@! 2 423° ara) _ -l 0 0 0 
~ 1a31 432 433. 434 0 0 O Tf 
a4, 442 443° «44 0 0 -1 O 


Expanding across the top row, only one 3 x 3 matrix survives: 


—1 0 0 bu bin 3 
D=(-1)'PapMp(a)=(-1)-()}| 0 0 1)=(C1) |b bao bys). 
0 -1 O b31  b32 33 


Expanding now across the second row, we get 


-1 0 


D= (1-0 mama=|y | 


[=1. 


When we finally reached a 2 x 2 determinant, it was simple to evaluate it without further 
expansion. | 


Linear Equation Systems 
We are now ready to apply our knowledge of determinants to the solution of systems of 
linear equations. Suppose we have the simultaneous equations 

ax) + a2x2 +43x3 =h1, 

by x1 + box2 + b3x3 = ho, 

Cy xy + c2x2 +.€3x3 = hy. (2.15) 
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To use determinants to help solve this equation system, we define 


a; a a@ 
D= |b, bo ba}. (2.16) 
Cc) C2.) C3 


Starting from x; D, we manipulate it by (1) moving x; to multiply the entries of the first 
column of D, then (2) adding to the first column x2 times the second column and x3 times 
the third column (neither of these operations change the value). We then reach the second 
line of Eq. (2.17) by substituting the right-hand sides of Eqs. (2.15). These operations are 
illustrated here: 


ax, a2 a3 a, x1 +42x2+43x3 a2 a3 
xj D= |bhx, bo b3)=|bp x, +b2x2+53x3 bo bg 
CixX, C2 C3 Cy xX, +02%2+03X3° C23 
hy az a 
=|ho by bs}. (2.17) 
hz c2 63 





If D £0, Eq. (2.17) may now be solved for x1: 


1 hy aa 

xy=—|ho bo bz}. (2.18) 
Dh 

3 C2 © 


Analogous procedures starting from x2 D and x3 D give the parallel results 








a hy a , [a1 a2 fo 
x.24= D bj ha b3 5 x34= D bj by hy : 
cy hz C3 c) co. hg 


We see that the solution for x; is 1/D times a numerator obtained by replacing the ith 
column of D by the right-hand-side coefficients, a result that can be generalized to an arbi- 
trary number n of simultaneous equations. This scheme for the solution of linear equation 
systems is known as Cramer’s rule. 

If D is nonzero, the above construction of the x; is definitive and unique, so that there 
will be exactly one solution to the equation set. If D 40 and the equations are homoge- 
neous (i.e., all the 4; are zero), then the unique solution is that all the x; are zero. 


Determinants and Linear Dependence 


The preceding subsections go a long way toward identifying the role of the determi- 
nant with respect to linear dependence. If n linear equations in n variables, written as 
in Eq. (2.15), have coefficients that form a nonzero determinant, the variables are uniquely 
determined, meaning that the forms constituting the left-hand sides of the equations must 
in fact be linearly independent. However, we would still like to prove the property illus- 
trated in the introduction to this chapter, namely that if a set of forms is linearly depen- 
dent, the determinant of their coefficients will be zero. But this result is nearly immediate. 
The existence of linear dependence means that there exists one equation whose coefficients 
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are linear combinations of the coefficients of the other equations, and we may use that fact 
to reduce to zero the row of the determinant corresponding to that equation. 
In summary, we have therefore established the following important result: 


If the coefficients of n linear forms inn variables form a nonzero determinant, the forms 
are linearly independent, if the determinant of the coefficients is zero, the forms exhibit 
linear dependence. 


Linearly Dependent Equations 


If a set of linear forms is linearly dependent, we can distinguish three distinct situations 
when we consider equation systems based on these forms. First, and of most importance 
for physics, is the case in which all the equations are homogeneous, meaning that the 
right-hand side quantities h; in equations of the type Eq. (2.15) are all zero. Then, one or 
more of the equations in the set will be equivalent to linear combinations of others, and 
we will have less than n equations in our n variables. We can then assign one (or in some 
cases, more than one) variable an arbitrary value, obtaining the others as functions of the 
assigned variables. We thus have a manifold (i.e., a parameterized set) of solutions to our 
equation system. 

Combining the above analysis with our earlier observation that if a set of homogeneous 
linear equations has a nonvanishing determinant it has the unique solution that all the x; 
are zero, we have the following important result: 


A system of n homogeneous linear equations in n unknowns has solutions that are not 
identically zero only if the determinant of its coefficients vanishes. If that determinant 
vanishes, there will be one or more solutions that are not identically zero and are 
arbitrary as to scale. 


A second case is where we have (or combine equations so that we have) the same linear 
form in two equations, but with different values of the right-hand quantities h;. In that case 
the equations are mutually inconsistent, and the equation system has no solution. 

A third, related case, is where we have a duplicated linear form, but with a common 
value of h;. This also leads to a solution manifold. 


Example 2.1.3 LINEARLY DEPENDENT HOMOGENEOUS EQUATIONS 


Consider the equation set 
xy +%x2+%3=0, 
xy +3x2+5x3=0, 
xy +2x2+32x3=0. 


Here 
1 11 
D=|1 3 5)=1(3)(3) —16)(@) — 133)C) — 10) (3) + 16)C) + 10) @) = 0. 
1 2 3 
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The third equation is half the sum of the other two, so we drop it. Then, 


second equation minus first: 2x2 + 4x3 = 0 — x2 = —2n3, 


(3x first equation) minus second: 2x; — 2x3 =0— x, = x3. 


Since x3 can have any value, there is an infinite number of solutions, all of the form 
(x1, X2, x3) = constant x (1, —2, 1). 

Our solution illustrates an important property of homogeneous linear equations, namely 
that any multiple of a solution is also a solution. The solution only becomes less arbitrary 
if we impose a scale condition. For example, in the present case we could require the 
squares of the x; to add to unity. Even then, the solution would still be arbitrary as to 
overall sign. a 


Numerical Evaluation 


There is extensive literature on determinant evaluation. Computer codes and many refer- 
ences are given, for example, by Press et al.! We present here a straightforward method 
due to Gauss that illustrates the principles involved in all the modern evaluation methods. 
Gauss elimination is a versatile procedure that can be used for evaluating determinants, 
for solving linear equation systems, and (as we will see later) even for matrix inversion. 


Example 2.1.4 — Gauss ELIMINATION 


Our example, a3 x 3 linear equation system, can easily be done in other ways, but it is used 
here to provide an understanding of the Gauss elimination procedure. We wish to solve 


3x+2y+z=11 
2x+3y+z=13 
x+y+4z= 12. (2.19) 


For convenience and for the optimum numerical accuracy, the equations are rearranged so 
that, to the extent possible, the largest coefficients run along the main diagonal (upper left 
to lower right). 

The Gauss technique is to use the first equation to eliminate the first unknown, x, from 
the remaining equations. Then the (new) second equation is used to eliminate y from the 
last equation. In general, we work down through the set of equations, and then, with one un- 
known determined, we work back up to solve for each of the other unknowns in succession. 





lw. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes, 2nd ed. Cambridge, UK: Cambridge 
University Press (1992), Chapter 2. 
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It is convenient to start by dividing each row by its initial coefficient, converting 
Eq. (2.19) to 


2 1 11 
Vea ae: 
3 1 13 
ae ee 
x+y+4z= 12. (2.20) 


Now, using the first equation, we eliminate x from the second and third equations by 
subtracting the first equation from each of the others: 


2. 1 11 

x+ 3 y + 3° = 3 

5 1 17 

ee Ss 

1 11 25 
gt os (2.21) 

Then we divide the second and third rows by their initial coefficients: 

2 1 11 

neg eae 

1 17 

yt 5 Z= = 
y+llz=25. (2.22) 


Repeating the technique, we use the new second equation to eliminate y from the third 
equation, which can then be solved for z: 








2 1 11 

oat ee ae 

1 7 

y+ 5 Z= a 

a z= = > z=2. (2.23) 
Now that z has been determined, we can return to the second equation, finding 
yt : x2= al > y=3, 

5 5 


and finally, continuing to the first equation, 
2 1 a 11 
x+ 3 x34 3 x2= 3 
The technique may not seem as elegant as the use of Cramer’s rule, but it is well adapted 
to computers and is far faster than the time spent with determinants. 
If we had not kept the right-hand sides of the equation system, the Gauss elimination 
process would have simply brought the original determinant into triangular form (but note 


> x=. 
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that our processes for making the leading coefficients unity cause corresponding changes 
in the value of the determinant). In the present problem, the original determinant 


3. 2 1 
D=|2 3 1 
1 1 4 


was divided by 3 and by 2 going from Eq. (2.19) to (2.20), and multiplied by 6/5 and 
by 3 going from Eq. (2.21) to (2.22), so that D and the determinant represented by the 
left-hand side of Eq. (2.23) are related by 


2 1 

1 = = 

3. 3 

5\ (1 1 55 

D= (3)(2)( =] {= -/=> 7—=18. 2.24 
(2) (5) 0 har a (2.24) 

54 

0 0 = 

5 








Because all the entries in the lower triangle of the determinant explicitly shown in 
Eq. (2.24) are zero, the only term that contributes to it is the product of the diagonal 
elements: To get a nonzero term, we must use the first element of the first row, then the 
second element of the second row, etc. It is easy to verify that the final result obtained in 


Eq. (2.24) agrees with the result of evaluating the original form of D. a 
Exercises 
2.1.1 Evaluate the following determinants: 
(a) |O 1 Of, () |3 1 2), () —~ : 
5/0 2 0 3 
1 0 0 03 #1 
0 0 V3 0 
2.1.2 Test the set of linear homogeneous equations 
x+3y+3z=0, x-—y+z=0, 2x+y+3z=0 
to see if it possesses a nontrivial solution. In any case, find a solution to this equation 
set. 
2.1.3 Given the pair of equations 
x+2y=3, 2x+4y=6, 
(a) Show that the determinant of the coefficients vanishes. 
(b) Show that the numerator determinants, see Eq. (2.18), also vanish. 
(c) Find at least two solutions. 
2.1.4 If Cj; is the cofactor of element a;;, formed by striking out the ith row and jth column 


and including a sign (—1)'*/, show that 
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(a) 0; aijCij = 0; ajiCji = | Al, where | A| is the determinant with the elements a;;, 
(b) So, aijCi = 0; 4jiCa =0, 7 Ak. 

A determinant with all elements of order unity may be surprisingly small. The Hilbert 
determinant H;; = (i+ j — 1)~!,i, 7 =1,2,...,n is notorious for its small values. 


(a) Calculate the value of the Hilbert determinants of order n for n = 1, 2, and 3. 


(b) Ifan appropriate subroutine is available, find the Hilbert determinants of order n 
forn=4,5, and 6. 








ANS. n Det(H,) 
1 1; 
2 8.33333 x 10-2 
3 4.62963 x 10-4 
4 1.65344 x 107-7 
5 3.74930 x 10712 
6 5.36730 x 10718. 





Prove that the determinant consisting of the coefficients from a set of linearly dependent 
forms has the value zero. 


Solve the following set of linear simultaneous equations. Give the results to five decimal 
places. 


1.0x; + 0.9x2 + 0.8x3 + 0.4x4 + 0.1x5 = 1.0 
0.9x; + 1.0x2 + 0.8x3 + 0.5x4 + 0.2x5 + 0.1x6 = 0.9 
0.8x1 + 0.8x2 + 1.0x3 + 0.7x4 + 0.4x5 + 0.2x6 =0.8 
0.4x; + 0.5x2 + 0.7x3 + 1.0x4 + 0.6x5 + 0.3x6 = 0.7 
O0.1x, + 0.2x2 + 0.4x%3 + 0.6x4 + 1.0x5 + 0.5x6 = 0.6 
0.1x2 + 0.2x3 + 0.3x4 + 0.5x5 + 1.0x6 = 0.5. 


Note. These equations may also be solved by matrix inversion, as discussed in 
Section 2.2. 


Show that (in 3-D space) 
(2) > °6i=3, 
i 
(b) SY djjeijx =0, 
ij 
(c) >: Eipge jpq = 265i); 


Pq 
(d) a EijkEijk = 6. 


ijk 
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Note. The symbol 4;; is the Kronecker delta, defined in Eq. (1.164), and ¢;;, is the 
Levi-Civita symbol, Eq. (2.8). 


2.1.9 Show that (in 3-D space) 


by Ei jkEpgk = Sipdjq — iqS jp: 
k 


Note. See Exercise 2.1.8 for definitions of 6;; and €;;x. 


2.2 MATRICES 


Matrices are 2-D arrays of numbers or functions that obey the laws that define matrix 
algebra. The subject is important for physics because it facilitates the description of 
linear transformations such as changes of coordinate systems, provides a useful formu- 
lation of quantum mechanics, and facilitates a variety of analyses in classical and rel- 
ativistic mechanics, particle theory, and other areas. Note also that the development of 
a mathematics of two-dimensionally ordered arrays is a natural and logical extension of 
concepts involving ordered pairs of numbers (complex numbers) or ordinary vectors (one- 
dimensional arrays). 

The most distinctive feature of matrix algebra is the rule for the multiplication of 
matrices. As we will see in more detail later, the algebra is defined so that a set of lin- 
ear equations such as 


ax, +a2x2 = hy 
bix| + box2 =h2 
can be written as a single matrix equation of the form 
(3, 62) G3) =() 
by bo} \x2} 0 \ho} 


In order for this equation to be valid, the multiplication indicated by writing the two 
matrices next to each other on the left-hand side has to produce the result 


aX] + a2x2 
dix + box2 
and the statement of equality in the equation has to mean element-by-element agreement of 


its left-hand and right-hand sides. Let’s move now to a more formal and precise description 
of matrix algebra. 


Basic Definitions 


A matrix is a set of numbers or functions in a 2-D square or rectangular array. There are 
no inherent limitations on the number of rows or columns. A matrix with m (horizontal) 
rows and n (vertical) columns is known as an m x n matrix, and the element of a matrix A 
in row i and column j is known as its i, j element, often labeled a;;. As already observed 
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) (fg) @u an) 





FiGURE 2.1 From left to right, matrices of dimension 4 x 1 (column vector), 
3 x 2,2 x 3,2 x 2 (square), 1 x 2 (row vector). 


when we introduced determinants, when row and column indices or dimensions are men- 
tioned together, it is customary to write the row indicator first. Note also that order matters, 
in general the i, 7 and j,i elements of a matrix are different, and (if m #4 n)n x m and 
m X n matrices even have different shapes. A matrix for which n = m is termed square; 
one consisting of a single column (an m x 1 matrix) is often called a column vector, while 
a matrix with only one row (therefore 1 x n) is a row vector. We will find that identi- 
fying these matrices as vectors is consistent with the properties identified for vectors in 
Section 1.7. 

The arrays constituting matrices are conventionally enclosed in parentheses (not vertical 
lines, which indicate determinants, or square brackets). A few examples of matrices are 
shown in Fig. 2.1. We will usually write the symbols denoting matrices as upper-case 
letters in a sans-serif font (as we did when introducing A); when a matrix is known to be a 
column vector we often denote it by a lower-case boldface letter in a Roman font (e.g., x). 

Perhaps the most important fact to note is that the elements of a matrix are not combined 
with one another. A matrix is not a determinant. It is an ordered array of numbers, not a 
single number. To refer to the determinant whose elements are those of a square matrix A 
(more simply, “the determinant of A’’), we can write det(A). 

Matrices, so far just arrays of numbers, have the properties we assign to them. These 
properties must be specified to complete the definition of matrix algebra. 


Equality 


If A and B are matrices, A= B only if a;; = b;; for all values of i and j. A necessary but 
not sufficient condition for equality is that both matrices have the same dimensions. 


Addition, Subtraction 


Addition and subtraction are defined only for matrices A and B of the same dimensions, in 
which case A+B = C, with c;; = aj; + b;; for all values of i and j, the elements combining 
according to the law of ordinary algebra (or arithmetic if they are simple numbers). This 
means that C will be a matrix of the same dimensions as A and B. Moreover, we see that 
addition is commutative: A+ B =B +A. It is also associative, meaning that (A+ B) + 
C=A+(B+C). A matrix with all elements zero, called a null matrix or zero matrix, 
can either be written as O or as a simple zero, with its matrix character and dimensions 
determined from the context. Thus, for all A, 








A+0=0+A=A. (2.25) 
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Multiplication (by a Scalar) 


Here what we mean by a scalar is an ordinary number or function (not another matrix). 
The multiplication of matrix A by the scalar quantity a produces B = aA, with bj; = a aj; 
for all values of i and j. This operation is commutative, with a A = Aq. 

Note that the definition of multiplication by a scalar causes each element of matrix A to 
be multiplied by the scalar factor. This is in striking contrast to the behavior of determinants 
in which @ det(A) is a determinant in which the factor @ multiplies only one column or 
one row of det(A) and not every element of the entire determinant. If A is ann x n square 
matrix, then 


det(wA) = w” det(A). 


Matrix Multiplication (Inner Product) 


Matrix multiplication is not an element-by-element operation like addition or multiplica- 
tion by a scalar. Instead, it is a more complicated operation in which each element of the 
product is formed by combining elements of a row of the first operand with correspond- 
ing elements of a column of the second operand. This mode of combination proves to be 
that which is needed for many purposes, and gives matrix algebra its power for solving 
important problems. This inner product of matrices A and B is defined as 


AB=C, with cjj=)  aixbyj. (2.26) 
k 


This definition causes the ij element of C to be formed from the entire ith row of A and 
the entire jth column of B. Obviously this definition requires that A have the same number 
of columns (7) as B has rows. Note that the product will have the same number of rows 
as A and the same number of columns as B. Matrix multiplication is defined only if these 
conditions are met. The summation in Eq. (2.26) is over the range of k from | to n, and, 
more explicitly, corresponds to 


Cij = Gi D1 j +. 4j2 625 +++ + a1n bnj- 


This combination rule is of a form similar to that of the dot product of the vectors 
(4j1, 4i2, ..., Gin) and (b1;, b2j,..., bnj). Because the roles of the two operands in a matrix 
multiplication are different (the first is processed by rows, the second by columns), the 
operation is in general not commutative, that is, AB 4 BA. In fact, AB may even have a 
different shape than BA. If A and B are square, it is useful to define the commutator of 
A and B, 


[A, B] = AB—BA, (2.27) 
which, as stated above, will in many cases be nonzero. 


Matrix multiplication is associative, meaning that (AB)C = A(BC). Proof of this state- 
ment is the topic of Exercise 2.2.26. 
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Example 2.2.1 Muttiptication, PAULI MATRICES 


These three 2 x 2 matrices, which occurred in early work in quantum mechanics by Pauli, 
are encountered frequently in physics contexts, so a familiarity with them is highly advis- 


able. They are 
0 1 0 -i 1 O 
ol= é a o2= € a) o3= € 4) (2.28) 


Let’s form 0,02. The 1, 1 element of the product involves the first row of o; and the first 
column of 02; these are shaded and lead to the indicated computation: 


(; ») Q a) —> 0(0) +1) =i. 


Continuing, we have 


_ (00) +1) O(-i)+10)\_ (i 0 
aiez= (io) 400) p00) = (0 ‘): en 


In a similar fashion, we can compute 


0201 = ({ - ({ 4 = @ 4) (2.30) 


It is clear that 0, and o2 do not commute. We can construct their commutator: 


z De ema) Wa 2 ae 
[o1, 02] = 0102 — 020, = 0 -i)7~\o i 


{1 0 . 
=2i (( =) = 2i03. (2.31) 
Note that not only have we verified that 0; and o2 do not commute, we have even evaluated 
and simplified their commutator. a 


Example 2.2.2 — Muttipuication, ROW AND COLUMN MATRICES 


As a second example, consider 
1 
A=|2|, B=(4 5 6). 
3 


Let us form AB and BA: 


4 5 6 
AB={ 8 10 12], BA=(4x1+5x2+4+6x 3) =(32). 
12 15 18 


The results speak for themselves. Often when a matrix operation leads to a 1 x 1 ma- 
trix, the parentheses are dropped and the result is treated as an ordinary number or 
function. a 
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Unit Matrix 


By direct matrix multiplication, it is possible to show that a square matrix with elements 
of value unity on its principal diagonal (the elements (i, 7) with i = j), and zeros every- 
where else, will leave unchanged any matrix with which it can be multiplied. For example, 
the 3 x 3 unit matrix has the form 


1 0 0 
0 1 Of; 
0 0 1 
note that it is not a matrix all of whose elements are unity. Giving such a matrix the name 1, 


IA=AL=A. (2.32) 


In interpreting this equation, we must keep in mind that unit matrices, which are square 
and therefore of dimensions n x n, exist for all n; the n values for use in Eq. (2.32) must 
be those consistent with the applicable dimension of A. So if Ais m x n, the unit matrix in 
1A must be m x m, while that in Al must ben x n. 

The previously introduced null matrices have only zero elements, so it is also obvious 
that for all A, 


OA=AO=O. (2.33) 


Diagonal Matrices 


If a matrix D has nonzero elements d;; only for i = j, it is said to be diagonal; a 3 x 3 
example is 


1 0 0 
D=|0 2 0 
0 0 3 


The rules of matrix multiplication cause all diagonal matrices (of the same size) to com- 
mute with each other. However, unless proportional to a unit matrix, diagonal matrices 
will not commute with nondiagonal matrices containing arbitrary elements. 


Matrix Inverse 


It will often be the case that given a square matrix A, there will be a square matrix B such 
that AB = BA=1. A matrix B with this property is called the inverse of A and is given 
the name A~!. If A~! exists, it must be unique. The proof of this statement is simple: If B 
and C are both inverses of A, then 


AB=BA=AC=CA=1. 





We now look at 
CAB=(CA)B=B, butalso CAB=C(AB)=C. 
This shows that B= C. 
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Every nonzero real (or complex) number a has a nonzero multiplicative inverse, often 
written 1/a. But the corresponding property does not hold for matrices; there exist nonzero 
matrices that do not have inverses. To demonstrate this, consider the following: 


1 1 1 0 0 0 
a=(j a e=(_| a) so ap=(5 ) 


If A has an inverse, we can multiply the equation AB = O on the left by A~', thereby 
obtaining 


AB=O — A~'saB=A'O — B=O. 


Since we started with a matrix B that was nonzero, this is an inconsistency, and we are 
forced to conclude that A~! does not exist. A matrix without an inverse is said to be singu- 
lar, so our conclusion is that A is singular. Note that in our derivation, we had to be careful 
to multiply both members of AB = O from the left, because multiplication is noncommu- 
tative. Alternatively, assuming B~! to exist, we could multiply this equation on the right 
by B7!, obtaining 


AB=O —> ABB-!=o0B-! —> A=O. 


This is inconsistent with the nonzero A with which we started; we conclude that B is also 
singular. Summarizing, there are nonzero matrices that do not have inverses and are iden- 
tified as singular. 

The algebraic properties of real and complex numbers (including the existence of 
inverses for all nonzero numbers) define what mathematicians call a field. The properties 
we have identified for matrices are different; they form what is called a ring. 

The numerical inversion of matrices is another topic that has been given much attention, 
and computer programs for matrix inversion are widely available. A closed, but cumber- 
some formula for the inverse of a matrix exists; it expresses the elements of A~! in terms of 
the determinants that are the minors of det(A); recall that minors were defined in the para- 
graph immediately before Eq. (2.14). That formula, the derivation of which is in several of 
the Additional Readings, is 


(DI Mj 
det(A) 


We describe here a well-known method that is computationally more efficient than 
Eq. (2.34), namely the Gauss-Jordan procedure. 


(A7!)ij = (2.34) 


Example 2.2.3. Gauss-JoRDAN MATRIX INVERSION 


The Gauss-Jordan method is based on the fact that there exist matrices My, such that the 
product M;A will leave an arbitrary matrix A unchanged, except with 


(a) one row multiplied by a constant, or 
(b) one row replaced by the original row minus a multiple of another row, or 


(c) the interchange of two rows. 


The actual matrices M, that carry out these transformations are the subject of Exercise 2.2.21. 
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By using these transformations, the rows of a matrix can be altered (by matrix multipli- 
cation) in the same ways we were able to change the elements of determinants, so we can 
proceed in ways similar to those employed for the reduction of determinants by Gauss elim- 
ination. If A is nonsingular, the application of a succession of M;, i.e., M= (... M/M; ML), 
can reduce A to a unit matrix: 


MA=1, or M=A"!, 


Thus, what we need to do is apply successive transformations to A until these transforma- 
tions have reduced A to 1, keeping track of the product of these transformations. The way 
in which we keep track is to successively apply the transformations to a unit matrix. 

Here is a concrete example. We want to invert the matrix 


Our strategy will be to write, side by side, the matrix A and a unit matrix of the same size, 
and to perform the same operations on each until A has been converted to a unit matrix, 
which means that the unit matrix will have been changed to A~!. We start with 


1 0 0 
and 0 1 0 
00 1 


re Nw 
Re Wb 
hee 


We multiply the rows as necessary to set to unity all elements of the first column of the left 
matrix: 


2 1 1 
1 - = = 0 0 
3. 3 3 
3 1 1 
1-2 =} ad fo = 9 
232 2 
1 1 4 0 0 1 


Subtracting the first row from the second and third rows, we obtain 


yoo s 1 oO 
3 3 3 

5 1 1 1 

Ore SS and Se 
6 6 a os 
o i il Lo 1 
3 3 3 
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Then we divide the second row (of both matrices) by 2 and subtract 3 times it from the 
first row and 5 times it from the third row. The results for both matrices are 


1 3 2 
10 = —- -—~ 0 
5 5 65 
ji. 2 aaa a 2% 
5 5 865 
1 1 1 
¢ 62 -- -- 1 
5 5 65 


We divide the third row (of both matrices) by 8. Then as the last step, 5 times the third 
row is subtracted from each of the first two rows (of both matrices). Our final pair is 








11 7 1 
00 — 
0 1 O} and A T= 
001 18 18 18 
1 1 P) 
18 18 18 
We can check our work by multiplying the original A by the calculated A~! to see if we 
really do get the unit matrix 1. a 


Derivatives of Determinants 


The formula giving the inverse of a matrix in terms of its minors enables us to write a 
compact formula for the derivative of a determinant det(A) where the matrix A has ele- 
ments that depend on some variable x. To carry out the differentiation with respect to the 
x dependence of its element a;;, we write det(A) as its expansion in minors M;; about the 


elements of row i, as in Eq. (2.14), so, appealing also to Eq. (2.34), we have 
d det(A i 
PEt) = (1/4 My = (A) ji ett). 
da; ; 


Applying now the chain rule to allow for the x dependence of all elements of A, we get 
d det(A) = dai; 
= det(A) ) 1A!) ji 4 

ij 


dx 








ae (2.35) 


Systems of Linear Equations 


Using the matrix inverse, we can write down formal solutions to linear equation systems. 
To start, we note that if A is an x n square matrix, and x and h aren x 1 column vectors, 
the matrix equation Ax = h is, by the rule for matrix multiplication, 


Ay X1 + y2X2 +++ + AinXn hy 
Az X1 + 422%2 +++ + Ann | _ py _ h2 


Ax = 


’ 


Ani X1 + an2xX2 + +++ + AnnXn hn 
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which is entirely equivalent to a system of n linear equations with the elements of A as 
coefficients. If A is nonsingular, we can multiply Ax = h on the left by A~!, obtaining the 
result x = A7'h. 

This result tells us two things: (1) that if we can evaluate A~!, we can compute the 
solution x; and (2) that the existence of A~! means that this equation system has a 
unique solution. In our study of determinants we found that a linear equation system had a 
unique solution if and only if the determinant of its coefficients was nonzero. We therefore 
see that the condition that A~! exists, i.e., that A is nonsingular, is the same as the condi- 
tion that the determinant of A, which we write det(A), be nonzero. This result is important 
enough to be emphasized: 


A square matrix A is singular if and only if det(A) = 0. (2.36) 


Determinant Product Theorem 


The connection between matrices and their determinants can be made deeper by estab- 
lishing a product theorem which states that the determinant of a product of two n x n 
matrices A and B is equal to the products of the determinants of the individual matrices: 


det(AB) = det(A) det(B). (2.37) 


As an initial step toward proving this theorem, let us look at det(AB) with the elements of 
the matrix product written out. Showing the first two columns explicitly, we have 


ayby + 442b21 + +++ + ajnbny — ay1b12 + a12b22 + +++ + ainbn2 
ag, by, + a22b21 +++ + 42nbny — a21b12 + a22b22 + +++ + Ganbn2 
det(AB) = ob ae 
Gn1by, + Gn2b21 + +++ + annbni = n1b12 + an2b22 + +++ + Gnnbn2 


Introducing the notation 


aij 
a2; ; 
aj= ie , thisbecomes det(AB) = y anh Yo apd. eel, 
inj jl ja 
where the summations over j1, j2,..-, jn run independently from | though n. Now, calling 


upon Egs. (2.12) and (2.13), we can move the summations and the factors b outside the 
determinant, reaching 


det(AB) = S00 +S) bj, 1b jn.2-+- Diy det(ajay «++ aj,)- (2.38) 
Jl 2 Jn 
The determinant on the right-hand side of Eq. (2.38) will vanish if any of the indices j, 


are equal; if all are unequal, that determinant will be + det(A), with the sign corresponding 
to the parity of the column permutation needed to put the a; in numerical order. Both 
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of these conditions are met by writing det(a;,a;, --- aj,) = &),...;, det(A), where e is the 
Levi-Civita symbol defined in Eq. (2.8). The above manipulations bring us to 


det(AB) = det(A) S> &jy...j,b/1,12).2°-- Bjnn = det(A) det(B), 
Jie Jn 


where the final step was to invoke the definition of the determinant, Eq. (2.10). This result 
proves the determinant product theorem. 

From the determinant product theorem, we can gain additional insight regarding singular 
matrices. Noting first that a special case of the theorem is that 


det(AA~!) = det(1) = 1 = det(A) det(A7!), 


we see that 





det(A~!) = (2.39) 


det(A) ’ 


It is now obvious that if det(A) = 0, then det(A~!) cannot exist, meaning that A~! cannot 
exist either. This is a direct proof that a matrix is singular if and only if it has a vanishing 
determinant. 


Rank of a Matrix 


The concept of matrix singularity can be refined by introducing the notion of the rank 
of a matrix. If the elements of a matrix are viewed as the coefficients of a set of linear 
forms, as in Eq. (2.1) and its generalization to n variables, a square matrix is assigned a 
rank equal to the number of linearly independent forms that its elements describe. Thus, a 
nonsingular n x n matrix will have rank n, while an x n singular matrix will have a rank 
r less than n. The rank provides a measure of the extent of the singularity; if r =n — 1, 
the matrix describes one linear form that is dependent on the others; r = n — 2 describes 
a situation in which there are two forms that are linearly dependent on the others, etc. We 
will in Chapter 6 take up methods for systematically determining the rank of a matrix. 


Transpose, Adjoint, Trace 


In addition to the operations we have already discussed, there are further operations that 
depend on the fact that matrices are arrays. One such operation is transposition. The 
transpose of a matrix is the matrix that results from interchanging its row and column 
indices. This operation corresponds to subjecting the array to reflection about its principal 
diagonal. If a matrix is not square, its transpose will not even have the same shape as the 
original matrix. The transpose of A, denoted A or sometimes A’, thus has elements 


(A)ij = aji- (2.40) 
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Note that transposition will convert a column vector into a row vector, so 


x] 


x2 


if x= , then x=(x] x2... Xn). 


Xn 


A matrix that is unchanged by transposition (i.e., A= A) is called symmetric. 

For matrices that may have complex elements, the complex conjugate of a matrix is 
defined as the matrix resulting if all elements of the original matrix are complex conju- 
gated. Note that this does not change the shape or move any elements to new positions. 
The notation for the complex conjugate of A is A*. 

The adjoint of a matrix A, denoted A‘, is obtained by both complex conjugating and 
transposing it (the same result is obtained if these operations are performed in either order). 
Thus, 


(Al)ij =a7,. (2.41) 


The trace, a quantity defined for square matrices, is the sum of the elements on the 
principal diagonal. Thus, for ann x n matrix A, 


n 
trace(A) = ey Gjj. (2.42) 
i=1 
From the rule for matrix addition, is is obvious that 
trace(A + B) = trace(A) + trace(B). (2.43) 


Another property of the trace is that its value for a product of two matrices A and B is 
independent of the order of multiplication: 


trace(AB) = Y(AB)ii = > So aijbji = > Yo djiaij 
i ij ji 


=- Y(BA) jj = trace(BA). (2.44) 
j 


This holds even if AB 4 BA. Equation (2.44) means that the trace of any commutator 
[A, B] = AB — BA is zero. Considering now the trace of the matrix product ABC, if we 
group the factors as A(BC), we easily see that 


trace(ABC) = trace(BCA). 
Repeating this process, we also find trace(ABC) = trace(CAB). Note, however, that we 


cannot equate any of these quantities to trace(CBA) or to the trace of any other noncyclic 
permutation of these matrices. 





106 


Chapter 2 Determinants and Matrices 


Operations on Matrix Products 


We have already seen that the determinant and the trace satisfy the relations 
det(AB) = det(A) det(B) = det(BA), _trace(AB) = trace(BA), 


whether or not A and B commute. We also found that trace(A + B) = trace(A) + trace(B) 
and can easily show that trace(wA) = a trace(A), establishing that the trace is a linear 
operator (as defined in Chapter 5). Since similar relations do not exist for the determinant, 
it is not a linear operator. 

We consider now the effect of other operations on matrix products. The transpose of a 
product, (AB)’, can be shown to satisfy 


(AB)! =BA, (2.45) 


showing that a product is transposed by taking, in reverse order, the transposes of its fac- 
tors. Note that if the respective dimensions of A and B are such as to make AB defined, it 
will also be true that BA is defined. 

Since complex conjugation of a product simply amounts to conjugation of its individual 
factors, the formula for the adjoint of a matrix product follows a rule similar to Eq. (2.45): 


(AB)' =BTA\. (2.46) 

Finally, consider (AB)~!. In order for AB to be nonsingular, neither A nor B can be 
singular (to see this, consider their determinants). Assuming this nonsingularity, we have 

(AB)! =BlAT!. (2.47) 


The validity of Eq. (2.47) can be demonstrated by substituting it into the obvious equation 
(AB)(AB)~! = 1. 


Matrix Representation of Vectors 


The reader may have already noted that the operations of addition and multiplication by a 
scalar are defined in identical ways for vectors (Section 1.7) and the matrices we are calling 
column vectors. We can also use the matrix formalism to generate scalar products, but in 
order to do so we must convert one of the column vectors into a row vector. The operation 
of transposition provides a way to do this. Thus, letting a and b stand for vectors in R?, 


by 
a-b —> (q, @ a3) | b2 |] =a,b) + anb2 +. a3)3. 
b3 
If in a matrix context we regard a and b as column vectors, the above equation assumes 
the form 
a-b —> ab. (2.48) 


This notation does not really lead to significant ambiguity if we note that when dealing with 
matrices, we are using lower-case boldface symbols to denote column vectors. Note also 
that because a’ b is a 1 x 1 matrix, it is synonymous with its transpose, which is b’ a. The 
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matrix notation preserves the symmetry of the dot product. As in Section 1.7, the square of 
the magnitude of the vector corresponding to a will be a’ a. 

If the elements of our column vectors a and b are real, then an alternate way of writing 
a’ b is a‘b. But these quantities are not equal if the vectors have complex elements, as will 
be the case in some situations in which the column vectors do not represent displacements 
in physical space. In that situation, the dagger notation is the more useful because then a‘a 
will be real and can play the role of a magnitude squared. 


Orthogonal Matrices 


A real matrix (one whose elements are real) is termed orthogonal if its transpose is equal 
to its inverse. Thus, if S is orthogonal, we may write 


s-'=s?, or SS’ =1 (S orthogonal). (2.49) 
Since, for S orthogonal, det(SS? ) = det(S) det(S7 ) = [det(S)]* = 1, we see that 


det(S) =+1 (S orthogonal). (2.50) 





It is easy to prove that if S and S’ are each orthogonal, then so also are SS’ and S’S. 


Unitary Matrices 


Another important class of matrices consists of matrices U with the property that U' = 
U!, ice., matrices for which the adjoint is also the inverse. Such matrices are identified as 
unitary. One way of expressing this relationship is 


U Ul =U'U=1 (U unitary). (2.51) 


If all the elements of a unitary matrix are real, the matrix is also orthogonal. 
Since for any matrix det(A’ ) = det(A), and therefore det(A') = det(A)*, application of 
the determinant product theorem to a unitary matrix U leads to 


det(U) det(U") = | det(U)|? = 1, (2.52) 


showing that det(U) is a possibly complex number of magnitude unity. Since such numbers 
can be written in the form exp(i@), with @ real, the determinants of U and U* will, for 
some 0, satisfy 


det(U)=e'", det(U') =e". 


Part of the significance of the term unitary is associated with the fact that the determinant 
has unit magnitude. A special case of this relationship is our earlier observation that if U is 
real, and therefore also an orthogonal matrix, its determinant must be either +1 or —1. 

Finally, we observe that if U and V are both unitary, then UV and VU will be unitary as 
well. This is a generalization of our earlier result that the matrix product of two orthogonal 
matrices is also orthogonal. 
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Hermitian Matrices 


There are additional classes of matrices with useful characteristics. A matrix is identified as 
Hermitian, or, synonymously, self-adjoint, if it is equal to its adjoint. To be self-adjoint, 
a matrix H must be square, and in addition, its elements must satisfy 


(H")i;=(H)ij —> hi’; =hijy (His Hermitian). (2.53) 


This condition means that the array of elements in a self-adjoint matrix exhibits a reflection 
symmetry about the principal diagonal: elements whose positions are connected by reflec- 
tion must be complex conjugates. As a corollary to this observation, or by direct reference 
to Eq. (2.53), we see that the diagonal elements of a self-adjoint matrix must be real. 

If all the elements of a self-adjoint matrix are real, then the condition of self-adjointness 
will cause the matrix also to be symmetric, so all real, symmetric matrices are self-adjoint 
(Hermitian). 

Note that if two matrices A and B are Hermitian, it is not necessarily true that AB or BA 
is Hermitian; however, AB + BA, if nonzero, will be Hermitian, and AB — BA, if nonzero, 
will be anti-Hermitian, meaning that (AB — BA)’ = —(AB — BA). 


Extraction of a Row or Column 


It is useful to define column vectors é€; which are zero except for the (i, 1) element, which 
is unity; examples are 


1 0 
0 1 

é:=] 0], @=] 0 |, etc. (2.54) 
0 0 


One use of these vectors is to extract a single column from a matrix. For example, if A is a 
3 x 3 matrix, then 


ay, aj2 ai3\ (0 a\2 
Aé2 =| 421 422 a23] | 1] = | a22 
a3, 432. 433 0 a32 


The row vector er can be used in a similar fashion to extract a row from an arbitrary 
matrix, as in 


aT 
e€; A= (ail 42 j3). 


These unit vectors will also have many uses in other contexts. 


Direct Product 


A second procedure for multiplying matrices, known as the direct tensor or Kronecker 
product, combines a m x n matrix A and am’ x n’ matrix B to make the direct product 
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matrix C = A @B, which is of dimension mm’ x nn’ and has elements 
Cop = Aij Bu, (2.55) 


with a = m'(i — 1) +k, B =n'(j — 1) +1. The direct product matrix uses the indices of 
the first factor as major and those of the second factor as minor; it is therefore a noncom- 
mutative process. It is, however, associative. 


Example 2.2.4 Direct Propucts 


We give some specific examples. If A and B are both 2 x 2 matrices, we may write, first in 
a somewhat symbolic and then in a completely expanded form, 


aby, aybi2 ai2bi  ay2b12 
ay1B ol ayjb21  ay,b22 ay2b21 ay 2b22 
a2,B  a22B a2jb\, d21b42_ a22b1, a22b12 | 
a21b21_ 421b22 a22b21_— a22b22 


A@e=( 


Another example is the direct product of two two-element column vectors, x and y. 
Again writing first in symbolic, and then expanded form, 


x11 
(":) es (2) = (333) _ | x12 
x2) \y2) Ley] | xy 
X22 


A third example is the quantity AB from Example 2.2.2. It is an instance of the special 
case (column vector times row vector) in which the direct and inner products coincide: 
AB=AQB. | 


If C and C’ are direct products of the respective forms 
C=AQ@B and C'=A’@B’, (2.56) 


and these matrices are of dimensions such that the matrix inner products AA’ and BB’ are 
defined, then 


CC’ = (AA’) @ (BB’). (2.57) 
Moreover, if matrices A and B are of the same dimensions, then 


C@(A+B)=C@A+C@B and (A+B)@C=A@QC+BOC. (2.58) 


Example 2.2.5 Dirac MATRICES 


In the original, nonrelativistic formulation of quantum mechanics, agreement between 
theory and experiment for electronic systems required the introduction of the concept of 
electron spin (intrinsic angular momentum), both to provide a doubling in the number of 
available states and to explain phenomena involving the electron’s magnetic moment. The 
concept was introduced in a relatively ad hoc fashion; the electron needed to be given 
spin quantum number 1/2, and that could be done by assigning it a two-component wave 
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function, with the spin-related properties described using the Pauli matrices, which were 
introduced in Example 2.2.1: 


01 fe8 1 0 
VI=\1 off “2 le ol 7 = lo <1) 


Of relevance here is the fact that these matrices anticommute and have squares that are unit 
matrices: 


of=12, and ojojtojo;=0, iF j. (2.59) 


In 1927, P. A. M. Dirac developed a relativistic formulation of quantum mechanics 
applicable to spin-1/2 particles. To do this it was necessary to place the spatial and time 
variables on an equal footing, and Dirac proceeded by converting the relativistic expression 
for the kinetic energy to an expression that was first order in both the energy and the 
momentum (parallel quantities in relativistic mechanics). He started from the relativistic 
equation for the energy of a free particle, 


E?= (pr + P3 + pic’ +m?c4 = pc” +m?4, (2.60) 


where p; are the components of the momentum in the coordinate directions, m is the 
particle mass, and c is the velocity of light. In the passage to quantum mechanics, the 
quantities p; are to be replaced by the differential operators —ifd/dx;, and the entire 
equation is applied to a wave function. 

It was desirable to have a formulation that would yield a two-component wave function 
in the nonrelativistic limit and therefore might be expected to contain the o;. Dirac made 
the observation that a key to the solution of his problem was to exploit the fact that the 
Pauli matrices, taken together as a vector 


o =o1e) +o! +0363, (2.61) 
could be combined with the vector p to yield the identity 
(o -p)°=p*l, (2.62) 


where 1 denotes a 2 x 2 unit matrix. The importance of Eq. (2.62) is that, at the price of 
going to 2 x 2 matrices, we can linearize the quadratic occurrences of E and p in Eq. (2.60) 
as follows. We first write 


E71, —c’(o-p)? =m". (2.63) 


We then factor the left-hand side of Eq. (2.63) and apply both sides of the resulting equation 
(which is a 2 x 2 matrix equation) to a two-component wave function that we will call w: 


(Eln+co-p)(Eln—co-p)W =m7c*W. (2.64) 
The meaning of this equation becomes clearer if we make the additional definition 


(Elx —co-p)h =mc7 yn. (2.65) 
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Substituting Eq. (2.65) into Eq. (2.64), we can then write the modified Eq. (2.64) and the 
(unchanged) Eq. (2.65) as the equation set 


(Ely +c0-p)W2=me7W, 
(2.66) 
(Elp —co-p)Wi =me7W; 


both these equations will need to be satisfied simultaneously. 

To bring Eqs. (2.66) to the form actually used by Dirac, we now make the substitution 
wi = wat we, v2 = Wa — Wp, and then add and subtract the two equations from each 
other, reaching a set of coupled equations in yw, and wa: 

EWa —co- pyrp=meyva, 
copia — Ep =me'pp. 


In anticipation of what we will do next, we write these equations in the matrix form 


F1 0 0 co-p Wa\ _ WA 
[(%o? E15) - (ore OP) (Ya)=me (Ga) eo 


We can now use the direct product notation to condense Eq. (2.67) into the simpler form 
[(o3 ® 1)E—y @c(o-p)]¥ =mery, (2.68) 


where W is the four-component wave function built from the two-component wave 


functions: 
7) 
v= , 
te 


and the terms on the left-hand side have the indicated structure because 
1 0 0 1 
o3= ¢ a) and we define y= e a (2.69) 


It has become customary to identify the matrices in Eq. (2.68) as y” and to refer to them 
as Dirac matrices, with 


1 0 
y=neh=(G ae 


The matrices resulting from the individual components of o in Eq. (2.68) are (for 
i=1,2,3) 


(2.70) 


- COCO 


oreo 


0 
1 
QO — 
0 


ooor 


Oj 
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Expanding Eq. (2.71), we have 


0 0 0 1 0 0 0 -i 
1_{ 0 0 1 0 > | 00 i 0 
r= 0 =i © oF “Io ¢ 0 oF 
ai G& OO G = © 0 -D 
0 0 1 
cf} & Oe te 
Y=|_, 6 6 (2.72) 
0 1 0 0 


Now that the y“ have been defined, we can rewrite Eq. (2.68), expanding o - p into 
components: 


[ye —c(y'pi+y? pot v5 ps)] w= mew, 


To put this matrix equation into the specific form known as the Dirac equation we multiply 
both sides of it (on the left) by y°. Noting that (y°)* = 1 and giving y°y' the new name 
aj, we reach 


[y°me? + c(a pi +02 p2 + 03 ps)| W=EW. (2.73) 


Equation (2.73) is in the notation used by Dirac with the exception that he used 6 as the 
name for the matrix here called y®. 

The Dirac gamma matrices have an algebra that is a generalization of that exhibited 
by the Pauli matrices, where we found that the a? = | and that if i ~ j, then oj and 
oj anticommute. Either by further analysis or by direct evaluation, it is found that, for 
uw =0,1,2,3 andi = 1, 2,3, 


(Yy=1, (yi =-1, (2.74) 
yeyity'y*=0, wi. (2.75) 


In the nonrelativistic limit, the four-component Dirac equation for an electron reduces 
to a two-component equation in which each component satisfies the Schrédinger equation, 
with the Pauli and Dirac matrices having completely disappeared. See Exercise 2.2.48. 
In this limit, the Pauli matrices reappear if we add to the Schrédinger equation an addi- 
tional term arising from the intrinsic magnetic moment of the electron. The passage to 
the nonrelativistic limit provides justification for the seemingly arbitrary introduction of a 
two-component wavefunction and use of the Pauli matrices for discussions of spin angular 
momentum. 

The Pauli matrices (and the unit matrix 17) form what is known as a Clifford algebra,” 
with the properties shown in Eq. (2.59). Since the algebra is based on 2 x 2 matrices, it 
can have only four members (the number of linearly independent such matrices), and is 
said to be of dimension 4. The Dirac matrices are members of a Clifford algebra of dimen- 
sion 16. A complete basis for this Clifford algebra with convenient Lorentz transformation 


2D. Hestenes, Am. J. Phys. 39: 1013 (1971); and J. Math. Phys. 16: 556 (1975). 
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properties consists of the 16 matrices 
ly, y=iyy'y*y? = (? ae y" (uw =0,1,2,3), 
2 


yy" (u4=0,1,2,3), ot” =ipty’ O<p<v<3). (2.76) 


Functions of Matrices 


Polynomials with one or more matrix arguments are well defined and occur often. Power 
series of a matrix may also be defined, provided the series converges for each matrix ele- 
ment. For example, if A is any n x n matrix, then the power series 


exp(A) = yt AY, (2.77) 
j= =o / 
eo (—1)s . 

sin(A) = py aEST AAT (2.78) 
je 

cos(A) = 3 a A2i (2.79) 
j=0 


are well-defined n x n matrices. For the Pauli matrices ox, the Euler identity for real 0 
and k = 1, 2, or 3, 


exp(iox9) = 12 cos @ + iox sind, (2.80) 


follows from collecting all even and odd powers of @ in separate series using a; = |. For 
the 4 x 4 Dirac matrices o“”, defined in Eq. (2.76), we have for 1 < uw <v <3, 


exp(ia""0) = 14cos@ + io”” sind, (2.81) 
while 
» Oke) - Ok os 
exp(io~ €) = 14cosh¢é +io™ sinhé (2.82) 


holds for real ¢ because (io)? = 1 fork = 1, 2, or 3. 
Hermitian and unitary matrices are related in that U, given as 


U=exp(iH), (2.83) 


is unitary if H is Hermitian. To see this, just take the adjoint: Ui = exp(—i H') = 
exp(—iH) = [exp(iH)]~! =U-!. 
Another result which is important to identify here is that any Hermitian matrix H satisfies 
a relation known as the trace formula, 
det (exp(H)) = exp (trace(H)). (2.84) 


This formula is derived at Eq. (6.27). 
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Finally, we note that the multiplication of two diagonal matrices produces a matrix that 
is also diagonal, with elements that are the products of the corresponding elements of the 
multiplicands. This result implies that an arbitrary function of a diagonal matrix will also 
be diagonal, with diagonal elements that are that function of the diagonal elements of the 
original matrix. 


Example 2.2.6 —EXPONENTIALOFA DIAGONAL MATRIX 


If a matrix A is diagonal, then its nth power is also diagonal, with the original diagonal 
matrix elements raised to the nth power. For example, given 


1 0 
a=(5 1) 
(03)” = ic yn) 


1 
ae, 0 


03 _ | n=0 ~ _ e 0 
* (<1) =(5 =!) 
! 


CO 
0 paar 
n=0 


then 


We can now compute 


= 





A final and important result is the Baker-Hausdorff formula, which, among other 
places is used in the coupled-cluster expansions that yield highly accurate electronic struc- 
ture calculations on atoms and molecules?: 


: [[A,T], T] + Es [[[A.T],T],T]+---. (2.85) 


exp(—T)Aexp(T) = A+ [A,T] + oT 31 


Exercises 


2.2.1 Show that matrix multiplication is associative, (AB)C = A(BC). 
2.2.2 Show that 
(A+B)(A—B)=A*—B° 
if and only if A and B commute, 


[A, B] = 0. 





3F. E. Harris, H. J. Monkhorst, and D. L. Freeman, Algebraic and Diagrammatic Methods in Many-Fermion Theory. New Y ork: 
Oxford University Press (1992). 





2.2.3 


2.2.4 


2.2.5 


2.2.6 


2.2.7 


2.2.8 
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(a) Complex numbers, a + ib, with a and b real, may be represented by (or are iso- 
morphic with) 2 x 2 matrices: 


me) 
a+ib <—> . 
—b a 
Show that this matrix representation is valid for (i) addition and (ii) multiplication. 
(b) Find the matrix corresponding to (a + ib)~!. 
If Ais ann x n matrix, show that 
det(—A) = (—1)" detA. 


(a) The matrix equation A* = 0 does not imply A = 0. Show that the most general 
2 x 2 matrix whose square is zero may be written as 


ab ob’ 
=o. apy? 
where a and J are real or complex numbers. 
(b) IfC=A+B, in general 


detC ¢ detA + detB. 


Construct a specific numerical example to illustrate this inequality. 


Given 
0 0 i 
K=]|-i 0], 
0 -1 O 
show that 


K” = KKK. -- (n factors) = 1 


(with the proper choice of n,n #0). 
Verify the Jacobi identity, 
Show that the matrices 
0 1 0 0 0 O 00 1 
A=]0 0 O}, B={0 0 1], C=|[0 0 0 
0 0 0 0 0 O 0 0 0 
satisfy the commutation relations 


[A,B]=C, [A,C]=0, and [B, C]=0. 
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2.2.9 


2.2.10 


2.2.11 


2.2.12 


Let 
0 1 0 0 0 0 0 -l 
i_ —-1 0 0 0 _ {0 0 —-I1 0 
=O 0 GTP 2-1o 1. Oo OP 
0 0 -1 O 1 O 0 0 
and 
0 0 -l1 O 
he 0 0 01 
~ 1d 0 0 0 
0 -l 0 0 
Show that 
(a) i? =j? =k* =—1, where 1 is the unit matrix. 
(b) ij=—ji=k, 
jk= -Kj =i, 
ki = —ik =j 


These three matrices (i, j, and k) plus the unit matrix 1 form a basis for quaternions. An 
alternate basis is provided by the four 2 x 2 matrices, io,,i02, —io3, and 1, where the 
o; are the Pauli spin matrices of Example 2.2.1. 


A matrix with elements a;; = 0 for j <i may be called upper right triangular. The 
elements in the lower left (below and to the left of the main diagonal) vanish. Show that 
the product of two upper right triangular matrices is an upper right triangular matrix. 


The three Pauli spin matrices are 
_ fo 1 _ (0 -i d _fl 0 
Oj = 1 o; 02 = Fi Oo}? an 03 = 0 -1)° 


(a) (oj)* =1, 
(b) oj;0; =iox, (i, j,k) = C1, 2, 3) or a cyclic permutation thereof, 


Show that 


(c) ojo; +0j;0; = 26;;12; 12 is the 2 x 2 unit matrix. 


One description of spin-1 particles uses the matrices 


i, (rae , (0 -i 0 
Meet ea, Wie coves 
V2\0 1 0 2\0 ¢ Oo 


and 





2.2.13 


2.2.14 


2.2.15 
2.2.16 
2.2.17 


2.2.18 


2.2.19 


2.2.20 
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Show that 


(a) [M,,M,]=iMz, and so on (cyclic permutation of indices). Using the Levi-Civita 
symbol, we may write 


[M;,Mj] =i So eijkMk. 
k 
(b) M2 = M2 + Ms, + M2 = 2 13, where 13 is the 3 x 3 unit matrix. 


(c) [M?,M;]=0 
M L 


where Lt = My + iMy and L~ = My —iMy. 


Repeat Exercise 2.2.12, using the matrices for a spin of 3/2, 


0 V3 0 O 0 -V3 0 O 
ueci{[v3 0 2 0 moci[v3 0 =2 Oo 
ues ree (eae 2 0 =—/3]" 

0 0 V7 O 0 o0 V7 0 

and 
3 0 0 0 
1/0 1 O 0 
M510 6 <1. 0 
0 0 0 -3 


If A is a diagonal matrix, with all diagonal elements different, and A and B commute, 
show that B is diagonal. 


If A and B are diagonal, show that A and B commute. 
Show that trace(ABC) = trace(CBA) if any two of the three matrices commute. 
Angular momentum matrices satisfy a commutation relation 
[M;,Mx]=iM;, j,k, cyclic. 
Show that the trace of each angular momentum matrix vanishes. 


A and B anticommute: AB = —BA. Also, A* = 1, B* = 1. Show that trace(A) = 

trace(B) = 0. 

Note. The Pauli and Dirac matrices are specific examples. 

(a) Iftwo nonsingular matrices anticommute, show that the trace of each one is zero. 
(Nonsingular means that the determinant of the matrix is nonzero.) 

(b) For the conditions of part (a) to hold, A and B must be n x n matrices with n even. 
Show that if n is odd, a contradiction results. 

If A! has elements 


-1) gD _ Git 
(A ij =; = IAI’ 
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2.2.21 


2.2.22 


2.2.23 


2.2.24 


where Cj; is the jith cofactor of |A|, show that 
A'A=1. 


Hence A7! is the inverse of A (if |A| 4 0). 


Find the matrices My, such that the product M;A will be A but with: 


(a) The ith row multiplied by a constant k (aj; > kajj, j = 1, 2, 3,...); 


(b) The ith row replaced by the original ith row minus a multiple of the mth row 
(aij > aij — Kamj,i = 1, 2,3,...); 


(c) The ith and mth rows interchanged (aj; > dmj,@mj > aij, j =1,2,3,...). 


Find the matrices Mr such that the product AMp will be A but with: 
(a) The ith column multiplied by a constant k (aj; > kaji, j = 1,2, 3,...); 


(b) The ith column replaced by the original ith column minus a multiple of the mth 
column (aji > aji — kajm, j = 1,2,3,...); 


(c) The ith and mth columns interchanged (aj; > djm,ajm > aji, j =1,2,3,...). 
Find the inverse of 

3 2 1 

A=|]2 2 1 

1 1 4 
Matrices are far too useful to remain the exclusive property of physicists. They may 
appear wherever there are linear relations. For instance, in a study of population move- 
ment the initial fraction of a fixed population in each of n areas (or industries or 
religions, etc.) is represented by an n-component column vector P. The movement of 
people from one area to another in a given time is described by an n x n (stochastic) 
matrix T. Here 7;; is the fraction of the population in the jth area that moves to the ith 
area. (Those not moving are covered by i = j.) With P describing the initial population 
distribution, the final population distribution is given by the matrix equation TP = Q. 


From its definition, )°/_, Pj = 1. 


(a) Show that conservation of people requires that 


n 
SORSh, fale 
i=1 


(b) Prove that 


> Oi =1 
i=l 


continues the conservation of people. 





2.2.25 


2.2.26 
2.2.27 
2.2.28 
2.2.29 


2.2.30 


2.2.31 


2.2.32 
2.2.33 


2.2.34 


2.2.35 


2.2.36 


2.2.37 
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Given a 6 x 6 matrix A with elements aij = 0.5!/-JI, i,j =0,1,2,...,5, find AT}, 
4 -—2 0 0 

—2 5 —2 0 

1 0 -—2 5 -—2 


NYNaNOCOO 
ANOCCO 


Show that the product of two orthogonal matrices is orthogonal. 





If A is orthogonal, show that its determinant = +1. 
Show that the trace of the product of a symmetric and an antisymmetric matrix is zero. 


A is 2 x 2 and orthogonal. Find the most general form of 
a b 
R= 6 i) 


det(A*) = (detA)* = det(A'). 


Show that 


Three angular momentum matrices satisfy the basic commutation relation 
[dx iJ yl = iJz 


(and cyclic permutation of indices). If two of the matrices have real elements, show that 
the elements of the third must be pure imaginary. 


Show that (AB)' = BTA’. 


A matrix C = S'S. Show that the trace is positive definite unless S is the null matrix, in 
which case trace (C) = 0. 


If A and B are Hermitian matrices, show that (AB + BA) and i(AB — BA) are also Her- 
mitian. 


The matrix C is not Hermitian. Show that then C + C* and i(C — C‘) are Hermitian. 
This means that a non-Hermitian matrix may be resolved into two Hermitian parts, 

1 + 1, r 

C= 7(C+C')+ Si(C—-C'). 

2 2i 
This decomposition of a matrix into two Hermitian matrix parts parallels the decompo- 
sition of a complex number z into x + iy, where x = (z + z*)/2 and y = (z — z*)/2i. 
A and B are two noncommuting Hermitian matrices: 

AB — BA=1C. 

Prove that C is Hermitian. 


Two matrices A and B are each Hermitian. Find a necessary and sufficient condition for 
their product AB to be Hermitian. 


ANS. [A, B] = 0. 
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2.2.38 
2.2.39 
2.2.40 


2.2.41 


2.2.42 
2.2.43 


2.2.44 


2.2.45 
2.2.46 


2.2.47 


2.2.48 


Show that the reciprocal (that is, inverse) of a unitary matrix is unitary. 
Prove that the direct product of two unitary matrices is unitary. 


If o is the vector with the o; as components given in Eq. (2.61), and p is an ordinary 
vector, show that 


(o-p)’ =p'ly, 
where 1) is a 2 x 2 unit matrix. 


Use the equations for the properties of direct products, Eqs. (2.57) and (2.58), to show 
that the four matrices y”, 4 = 0, 1, 2, 3, satisfy the conditions listed in Eqs. (2.74) and 
(2.75). 


Show that y°, Eq. (2.76), anticommutes with all four y”. 


In this problem, the summations are over pp = 0,1,2,3. Define g,,, = g” by the 
relations 


go=1; ge=—l, k=1,2,3; gy,=0, wv; 


and define y,, as }° gy,y". Using these definitions, show that 


(a2) Viryy*y"=—2y%, 

(6) Dyur*v’y* =48%, 

() Vypy*yPyry! =—2y*yPy*. 

IfM= 5(1 + >), where y> is given in Eq. (2.76), show that 
M*=M. 


Note that this equation is still satisfied if y is replaced by any other Dirac matrix listed 
in Eq. (2.76). 


Prove that the 16 Dirac matrices form a linearly independent set. 


If we assume that a given 4 x 4 matrix A (with constant elements) can be written as a 
linear combination of the 16 Dirac matrices (which we denote here as I’; ) 


16 
A= Da Ci T; 5 
i=l 
show that 
cj ~ trace(AT;). 


The matrix C = iyy® is sometimes called the charge conjugation matrix. Show that 
CytCoh = — (yh)? 


(a) Show that, by substitution of the definitions of the y” matrices from Eqs. (2.70) 
and (2.72), that the Dirac equation, Eq. (2.73), takes the following form when 
written as 2 x 2 blocks (with wz and ws column vectors of dimension 2). Here 
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L and S stand, respectively, for “large” and “small” because of their relative size 
in the nonrelativistic limit): 


mc? — E c(o1 p1 + 02p2 + 03p3)\ (Wi -( 
5 =0. 
—c(o1 p1 + 02p2 + 0373) —mc—E Vs 


(b) To reach the nonrelativistic limit, make the substitution E = mc* + ¢ and approx- 
imate —2mc? — ¢ by —2mc?. Then write the matrix equation as two simultaneous 
two-component equations and show that they can be rearranged to yield 


1 2 2 3) _ 
om (73 + Py + P3) WL =EvL, 


which is just the Schrédinger equation for a free particle. 


(c) Explain why is it reasonable to call wz and ws “large” and “small.” 


2.2.49 Show that it is consistent with the requirements that they must satisfy to take the Dirac 
gamma matrices to be (in 2 x 2 block form) 
0 0 1o i 0 Oj : 
= 7 => ’ = 1, 2, 3 . 
Y © 0 ) : (2 i 0 ) @ ) 
This choice for the gamma matrices is called the Weyl representation. 

2.2.50 Show that the Dirac equation separates into independent 2 x 2 blocks in the Wey] rep- 
resentation (see Exercise 2.2.49) in the limit that the mass m approaches zero. This 
observation is important in the ultra relativistic regime where the rest mass is inconse- 
quential, or for particles of negligible mass (e.g., neutrinos). 

2.2.51 (a) Given r’ = Ur, with U a unitary matrix and r a (column) vector with complex 

elements, show that the magnitude of r is invariant under this operation. 
(b) The matrix U transforms any column vector r with complex elements into r’, 
leaving the magnitude invariant: r'r = rr’. Show that U is unitary. 
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CHAPTER 3 


VECTOR ANALYSIS 


The introductory section on vectors, Section 1.7, identified some basic properties that are 
universal, in the sense that they occur in a similar fashion in spaces of different dimension. 
In summary, these properties are (1) vectors can be represented as linear forms, with oper- 
ations that include addition and multiplication by a scalar, (2) vectors have a commutative 
and distributive dot product operation that associates a scalar with a pair of vectors and 
depends on their relative orientations and hence is independent of the coordinate system, 
and (3) vectors can be decomposed into components that can be identified as projections 
onto the coordinate directions. In Section 2.2 we found that the components of vectors 
could be identified as the elements of a column vector and that the scalar product of two 
vectors corresponded to the matrix multiplication of the transpose of one (the transposition 
makes it a row vector) with the column vector of the other. 

The current chapter builds on these ideas, mainly in ways that are specific to three- 
dimensional (3-D) physical space, by (1) introducing a quantity called a vector cross 
product to permit the use of vectors to represent rotational phenomena and volumes in 3-D 
space, (2) studying the transformational properties of vectors when the coordinate system 
used to describe them is rotated or subjected to a reflection operation, (3) developing math- 
ematical methods for treating vectors that are defined over a spatial region (vector fields), 
with particular attention to quantities that depend on the spatial variation of the vector field, 
including vector differential operators and integrals of vector quantities, and (4) extending 
vector concepts to curvilinear coordinate systems, which are very useful when the sym- 
metry of the coordinate system corresponds to a symmetry of the problem under study (an 
example is the use of spherical polar coordinates for systems with spherical symmetry). 

A key idea of the present chapter is that a quantity that is properly called a vector 
must have the transformation properties that preserve its essential features under coordinate 
transformation; there exist quantities with direction and magnitude that do not transform 
appropriately and hence are not vectors. This study of transformation properties will, in a 
subsequent chapter, ultimately enable us to generalize to related quantities such as tensors. 
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Finally, we note that the methods developed in this chapter have direct application in 


electromagnetic theory as well as in mechanics, and these connections are explored through 
the study of examples. 


3.1 REVIEW OF BASIC PROPERTIES 


In Section 1.7 we established the following properties of vectors: 


1. 


Vectors satisfy an addition law that corresponds to successive displacements that can 
be represented by arrows in the underlying space. Vector addition is commutative 
and associative: A+ B=B+ A and (A+ B)+C=A+ (B+ OC). 

A vector A can be multiplied by a scalar k; if k > 0 the result will be a vector in the 
direction of A but with its length multiplied by k; if k < 0 the result will be in the 
direction opposite to A but with its length mutiplied by |k]. 

The vector A — B is interpreted as A + (—1)B, so vector polynomials, e.g., A— 2B+ 
3C, are well-defined. 

A vector of unit length in the coordinate direction x; is denoted é;. An arbitrary vector 
A can be written as a sum of vectors along the coordinate directions, as 


A=A,@;+A2@2+::-. 


The A; are called the components of A, and the operations in Properties | to 3 cor- 
respond to the component formulas 


G=A-—2B+3C = G,;=A;,-—2B;+3Cj, (eachi). 


The magnitude or length of a vector A, denoted |A| or A, is given in terms of its 
components as 


JA] = (A7 + Ad +--+). 


The dot product of two vectors is given by the formula 
A-B= A,B, +A2B.+-::; 
consequences are 
|A>=A-A, A-B=|A||B|cos6, 
where 6 is the angle between A and B. 


If two vectors are perpendicular to each other, their dot product vanishes and they are 
termed orthogonal. The unit vectors of a Cartesian coordinate system are orthogonal: 


GG = 5i;, (3.1) 


where 4;; is the Kronecker delta, Eq. (1.164). 

The projection of a vector in any direction has an algebraic magnitude given by its 
dot product with a unit vector in that direction. In particular, the projection of A on 
the é; direction is A;é;, with 


Nee 
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9. The components of A in R? are related to its direction cosines (cosines of the angles 
that A makes with the coordinate axes) by the formulas 


A,=Acosa, Ay=AcosB, A,=Acosy, 
and cos” a + cos” B + cos? y = 1. 


In Section 2.2 we noted that matrices consisting of a single column could be used to 
represent vectors. In particular, we found, illustrating for the 3-D space IR, the following 
properties. 


10. A vector A can be represented by a single-column matrix a whose elements are the 
components of A, as in 


Ai 


The rows (i.e., individual elements A;) of a are the coefficients of the individual 
members of the basis used to represent A, so the element A; is associated with the 
basis unit vector @;. 

11. The vector operations of addition and multiplication by a scalar correspond exactly 
to the operations of the same names applied to the single-column matrices represent- 
ing the vectors, as illustrated here: 


G| Al By Ci 
G=A-—2B+3C => G2}=|A2}|]—2]| Bo} +3] Co 
G3 A3 Bs C3 
A; —2B,+3C, 
= | Ar. —2B).+3C2], or g=a-—2b+3ce. 
A3 —2B34+3C3 


It is therefore appropriate to call these single-column matrices column vectors. 
12. The transpose of the matrix representing a vector A is a single-row matrix, called a 
row vector: 


a’ =(A, A> A3). 


The operations illustrated in Property 11 also apply to row vectors. 
13. The dot product A - B can be evaluated as a’ b, or alternatively, because a and b are 
real, as a'b. Moreover, a’ b = b/ a. 


B, 
A-B=a'b=(A; Az A3){ Bo | = A,B, + ArBo + A3B3. 
B3 
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3.2. VECTORS IN 3-D SPACE 


We now proceed to develop additional properties for vectors, most of which are applicable 
only for vectors in 3-D space. 


Vector or Cross Product 


A number of quantities in physics are related to angular motion or the torque required to 
cause angular acceleration. For example, angular momentum about a point is defined as 
having a magnitude equal to the distance r from the point times the component of the 
linear momentum p perpendicular to r—the component of p causing angular motion (see 
Fig. 3.1). The direction assigned to the angular momentum is that perpendicular to both 
r and p, and corresponds to the axis about which angular motion is taking place. The 
mathematical construction needed to describe angular momentum is the cross product, 
defined as 


C=Ax B= (ABsinO)é&. (3.2) 


Note that C, the result of the cross product, is stated to be a vector, with a magnitude that 
is the product of the magnitudes of A, B and the sine of the angle 0 < z between A and B. 
The direction of C, i.e., that of €,, is perpendicular to the plane of A and B, such that A, B, 
and C form a right-handed system.! This causes C to be aligned with the rotational axis, 
with a sign that indicates the sense of the rotation. 

From Fig. 3.2, we also see that A x B has a magnitude equal to the area of the parallel- 
ogram formed by A and B, and with a direction normal to the parallelogram. 

Other places the cross product is encountered include the formulas 


v=oxr and Fy=qvxB. 


The first of these equations is the relation between linear velocity v and and angular veloc- 
ity w, and the second equation gives the force Fy on a particle of charge q and velocity v 
in the magnetic induction field B (in SI units). 











Figure 3.1 Angular momentum about the origin, L=r x p. 
L has magnitude rp sin@ and is directed out of the plane of the paper. 





' The inherent ambiguity in this statement can be resolved by the following anthropomorphic prescription: Point the right hand 
in the direction A, and then bend the fingers through the smaller of the two angles that can cause the fingers to point in the 
direction B; the thumb will then point in the direction of C. 
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FIGURE 3.2 Parallelogram of A x B. 


We can get our right hands out of the analysis by compiling some algebraic properties of 
the cross product. If the roles of A and B are reversed, the cross product changes sign, so 


Bx A=-—AxB _ (anticommutation). (3.3) 
The cross product also obeys the distributive laws 
Ax (B+C)=AxB+AxC, k(Ax B)= (KA) xB, (3.4) 


and when applied to unit vectors in the coordinate directions, we get 
Gx 6 =) einer. (3.5) 
k 


Here ¢;;x is the Levi-Civita symbol defined in Eq. (2.8); Eq. (3.5) therefore indicates, for 
example, that é, x €, =0, é, x @, = @,, but é, x é, = —&. 
Using Eq. (3.5) and writing A and B in component form, we can expand A x B to obtain 


C=Ax B= (Axe, + Ayty + Az€,) X (By éx + Byéy + Be€,) 
= (A, By — Ay By) (@ x @y) + (Ay B, — A, By) (€x x @,) 
+ (AyB, — Az By)(@y x €) 
= (A, By — Ay By)é, + (Ay Bz — Az By)(—6y) + (Ay B, — AzBy)éx. (3.6) 
The components of C are important enough to be displayed prominently: 
Cy, = AyB,—A,By, Cy=A,B,—A,Bz,, Cz,=A,By — AyBy, (3.7) 


equivalent to 


Ci = S- cijn Aj Be. (3.8) 
Jk 
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Yet another way of expressing the cross product is to write it as a determinant. It is 
straightforward to verify that Eqs. (3.7) are reproduced by the determinantal equation 


& & & 
C=|A, A, A,). (3.9) 
B 


B, By 2 


when the determinant is expanded in minors of its top row. The anticommutation of the 
cross product now clearly follows if the rows for the components of A and B are inter- 
changed. 

We need to reconcile the geometric form of the cross product, Eq. (3.2), with the alge- 
braic form in Eq. (3.6). We can confirm the magnitude of A x B by evaluating (from the 
component form of C) 


(A x B)- (A x B) = A?B? — (A- B)* = A’ B? — A*B* cos” 0 
= A” B’ sin? 0. (3.10) 


The first step in Eq. (3.10) can be verified by expanding its left-hand side in component 
form, then collecting the result into the terms constituting the central member of the first 
line of the equation. 

To confirm the direction of C = A x B, we can check that A- C = B- C = 0, showing 
that C (in component form) is perpendicular to both A and B. We illustrate for A - C: 


A-C= A,(A,B, — A, By) + Ay(AzBy — AyBz) + A;(AyBy — AyBy)=0. (3.11) 


To verify the sign of C, it suffices to check special cases (e.g., A= €;, B= @y, or Ay = 
By = 1, all other components zero). 

Next, we observe that it is obvious from Eq. (3.2) that if C = A x B ina given coordinate 
system, then that equation will also be satisfied if we rotate the coordinates, even though 
the individual components of all three vectors will thereby be changed. In other words, the 
cross product, like the dot product, is a rotationally invariant relationship. 

Finally, note that the cross product is a quantity specifically defined for 3-D space. It is 
possible to make analogous definitions for spaces of other dimensionality, but they do not 
share the interpretation or utility of the cross product in R?. 


Scalar Triple Product 


While the various vector operations can be combined in many ways, there are two combi- 
nations involving three operands that are of particular importance. We call attention first 
to the scalar triple product, of the form A- (B x C). Taking (B x C) in the determinantal 
form, Eq. (3.9), one can see that taking the dot product with A will cause the unit vector é, 
to be replaced by Ax, with corresponding replacements to é, and é,. The overall result is 


Ae Ay As 
A-(BxC)=|B, By B,|. (3.12) 
Cy 


We can draw a number of conclusions from this highly symmetric determinantal form. 
To start, we see that the determinant contains no vector quantities, so it must evaluate 
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Figure 3.3. A- (Bx C) parallelepiped. 


to an ordinary number. Because the left-hand side of Eq. (3.12) is a rotational invariant, 
the number represented by the determinant must also be rotationally invariant, and can 
therefore be identified as a scalar. Since we can permute the rows of the determinant (with 
a sign change for an odd permutation, and with no sign change for an even permutation), 
we can permute the vectors A, B, and C to obtain 


A-BxC=B-CxA=C-AxB=-A-CxB, etc. (3.13) 


Here we have followed common practice and dropped the parentheses surrounding the 
cross product, on the basis that they must be understood to be present in order for the 
expressions to have meaning. Finally, noting that B x C has a magnitude equal to the area 
of the BC parallelog ram and a direction perpendicular to it, and that the dot product with 
A will multiply that area by the projection of A on B x C, we see that the scalar triple 
product gives us (+) the volume of the parallelepiped defined by A, B, and C; see Fig. 3.3. 


Example 3.2.1 RECIPROCAL LATTICE 


Let a, b, and ¢ (not necessarily mutually perpendicular) represent the vectors that define a 
crystal lattice. The displacements from one lattice point to another may then be written 


R=n,a+npb+n-e, (3.14) 
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with ng, np, and n, taking integral values. In the band theory of solids,” it is useful to 
introduce what is called a reciprocal lattice a’, b’, ce’ such that 
wap Seo (3.15) 


and with 





a-b' =a-c’=b-a =b-e’=c-a =c-b =0. (3.16) 


The reciprocal-lattice vectors are easily constructed by calling on the fact that for any u 
and v, u x Vv is perpendicular to both u and v; we have 


‘ bxec j cxa j axb 


= | eee =, 3.17 

" a-bxc a-bxec . a-bxec ony 
The scalar triple product causes these expressions to satisfy the scale condition of 
Eq. (3.15). r 
Vector Triple Product 


The other triple product of importance is the vector triple product, of the form A x 
(B x C). Here the parentheses are essential since, for example, (€, x €x) x €y = 0, while 
ey x (& X @y) = €, x &, = —@). Our interest is in reducing this triple product to a simpler 
form; the result we seek is 

Ax (Bx C)=B(A-C)— C(A-B). (3.18) 


Equation (3.18), which for convenience we will sometimes refer to as the BAC—CAB rule, 
can be proved by inserting components for all vectors and evaluating all the products, but it 
is instructive to proceed in a more elegant fashion. Using the formula for the cross product 
in terms of the Levi-Civita symbol, Eq. (3.8), we write 


Ax (Bx C)= 6 So eijk Aj (x nna) 


i ik Pq 
=> SG ABpCy d eijneKpg- (3.19) 
ij Pq k 
The summation over k of the product of Levi-Civita symbols reduces, as shown in 
Exercise 2.1.9, to dipdjq — digdjp; we are left with 


ij i 


Ax (Bx C)= ) 6 Aj(BjCj — BjC)) =) & | Bi) AjCj —C) 9) AGB; J. 
j j 


which is equivalent to Eq. (3.18). 





2It is often chosen to require a - a’, etc., to be 27 rather than unity, because when Bloch states for a crystal (labeled by k) are 
set up, a constituent atomic function in cell R enters with coefficient exp(ik- R), and if k is changed by a reciprocal lattice step 
(in, say, the a’ direction), the coefficient becomes exp(i[k + a’] - R), which reduces to exp(27ing) exp(ik - R) and therefore, 
because exp(27ing) = 1, to its original value. Thus, the reciprocal lattice identifies the periodicity in k. The unit cell of the k 
vectors is called the Brillouin zone 





Exercises 


3.2.1 


3.2.2 
3.2.3 


3.2.4 


3.2.5 


3.2.6 


3.2.7 
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If P= é, Py + €y Py and Q =, Q, + @, Qy are any two nonparallel (also nonantiparal- 
lel) vectors in the xy-plane, show that P x Q is in the z-direction. 


Prove that (A x B)- (A x B) = (AB)? — (A- B)?. 
Using the vectors 
P=€, cos0 + é, sind, 
Q =e, cosp — ey sing, 
R=€, cosg + @ sing, 
prove the familiar trigonometric identities 
sin(@ + y) = sin@ cosg + cosé sing, 
cos(@ + yg) =cos@ cos — sind sing. 
(a) Finda vector A that is perpendicular to 
U=26@, +6, — &, 
V=6é,-@) + &. 


(b) What is A if, in addition to this requirement, we demand that it have unit 
magnitude? 


If four vectors a, b, c, and d all lie in the same plane, show that 
(a x b) x (c x d) = 0. 

Hint. Consider the directions of the cross-product vectors. 

Derive the law of sines (see Fig. 3.4): 


sina sinf _ siny 


IA} |B Cl” 





The magnetic induction B is defined by the Lorentz force equation, 


F=q(v x B). 








FiGureE 3.4 Plane triangle. 
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Carrying out three experiments, we find that if 


From the results of these three separate experiments calculate the magnetic induction B. 


3.2.8 You are given the three vectors A, B, and C, 


(a) Compute the scalar triple product, A- B x C. Noting that A= B+ C, give a 
geometric interpretation of your result for the scalar triple product. 


(b) Compute A x (Bx C). 


3.2.9 Prove Jacobi’s identity for vector products: 
ax (bx c)+bx (cx a) +e x (ax b) =0. 
3.2.10 A vector A is decomposed into a radial vector A, and a tangential vector A;. If f is a 
unit vector in the radial direction, show that 
(a) A, =f(A-f) and 
(b) A,;=—F x (F x A). 
3.2.11 Prove that a necessary and sufficient condition for the three (nonvanishing) vectors A, 
B, and C to be coplanar is the vanishing of the scalar triple product 
A-BxC=0. 
3.2.12 Three vectors A, B, and C are given by 
A = 3é, — 2é, + 22, 
B= 6é, + 4é, — 22, 
C= —3é, — 2é, — 42. 
Compute the values of A- B x C and A x (B x C), C x (A x B) and B x (C x A). 
3.2.13 Show that 
(A x B)- (C x D) = (A- C)(B- D) — (A- D)(B- C). 
3.2.14 Show that 
(A x B) x (C x D) = (A- Bx D)C— (A- Bx ODD. 
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3.2.15 An electric charge gj moving with velocity vj produces a magnetic induction B 
given by 





mks units), 

4 t1 72 ( ) 

where F is a unit vector that points from q; to the point at which B is measured (Biot 
and Savart law). 


(a) Show that the magnetic force exerted by g; on a second charge qz, velocity vo, is 
given by the vector triple product 


(b) Write out the corresponding magnetic force F, that gz exerts on q;. Define your 
unit radial vector. How do F, and F) compare? 

(c) Calculate F; and F2 for the case of g; and gz moving along parallel trajectories 
side by side. 


ANS. 
_ 40 9192 


(b) Fy= = = M x (v2 x fr). 


In general, there is no simple relation between 
F, and Fo. Specifically, Newton’s third law, 
F, = —F», does not hold. 
Mo 9142 
F, = — 
Cae An r? 
Mutual attraction. 





vr= —F». 


3.3. COORDINATE TRANSFORMATIONS 


As indicated in the chapter introduction, an object classified as a vector must have specific 
transformation properties under rotation of the coordinate system; in particular, the com- 
ponents of a vector must transform in a way that describes the same object in the rotated 
system. 


Rotations 


Considering initially IR?, and a rotation of the coordinate axes as shown in Fig. 3.5, we 
wish to find how the components A, and Ay of a vector A in the unrotated system are 
related to A’. and A‘, its components in the rotated coordinate system. Perhaps the easiest 
way to answer this question is by first asking how the unit vectors é, and €, are represented 
in the new coordinates, after which we can perform vector addition on the new incarnations 
of A,é, and Ayéy. 

From the right-hand part of Fig. 3.5, we see that 


é, =cosgé, —singé,, and é,=singé, + cosgéi,, (3.20) 





134 


Chapter 3 Vector Analysis 











FiGuRE 3.5 Left: Rotation of two-dimensional (2-D) coordinate axes through angle g. 
Center and right: Decomposition of é, and é, into their components in the rotated system. 


so the unchanged vector A now takes the changed form 
A= A,é, + Ayéy = A, (cos g@, — sin ge.) + A, (sin gé, + cos ge) 
= (Ay cosy + Ay sing)é, + (—Ax sing + Ay cos @)é,. (3.21) 
If we write the vector A in the rotated (primed) coordinate system as 
A= Ae. + Aye, 
we then have 
A), =Axcosg+ Aysing, A‘, =—A, sing + Aycosg, (3.22) 


which is equivalent to the matrix equation 


A’ cosy sing\/(A 
ri x\_ ‘s 
= (<) ~ ee cos ) co) (3.23) 


Suppose now that we start from A as given by its components in the rotated system, 
(A‘,, A‘), and rotate the coordinate system back to its original orientation. This will entail 
a rotaton in the amount —g, and corresponds to the matrix equation 


Ax\ _ ( cos(—g)  sin(—g)\ (AL\ _ (cosy —sing)\ (A’, (3.24) 

Ay) \-sin(—g) cos(—g)) \A) ~~ \sing cose) \A,)” . 
Assigning the 2 x 2 matrices in Eqs. (3.23) and (3.24) the respective names S and S’, we 
see that these two equations are equivalent to A’ = SA and A= S’A’, with 


s=( cos p mt 1 = a) (3.25) 


—sing cos@ sing cos 


Now, applying S to A and then S’ to SA (corresponding to first rotating the coordinate 
system an amount +@ and then an amount —@), we recover A, or 


A=S'SA. 


Since this result must be valid for any A, we conclude that S’ = S—!. We also see that 
S’=S?. We can check that SS’ = 1 by matrix multiplication: 


ssi _( °S? sing\ (cose —sing\ (1 0 
~\—sing cosg/\sing cosg) \0O I)’ 
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Since S is real, the fact that S~! = S? means that it is orthogonal. In summary, we have 
found that the transformation connecting A and A’ (the same vector, but represented in the 
rotated coordinate system) is 


A’=SA, (3.26) 


with S an orthogonal matrix. 


Orthogonal Transformations 


It was no accident that the transformation describing a rotation in R* was orthogonal, by 
which we mean that the matrix effecting the transformation was an orthogonal matrix. 
An instructive way of writing the transformation S is, returning to Eq. (3.20), to rewrite 
those equations as 
é = (€ 6) + (@, : é,)é,,, éy = (@, - @)e, + (e, : eye. (3.27) 
This corresponds to writing é, and @, as the sum of their projections on the orthogonal 
vectors é), and @',. Now we can rewrite S as 


a ae oe 
S= Gj oo a): (3.28) 
"@y Gy 

This means that each row of S contains the components (in the unprimed coordinates) of 
a unit vector (either é., or é',) that is orthogonal to the vector whose components are in the 
other row. In turn, this means that the dot products of different row vectors will be zero, 
while the dot product of any row vector with itself (because it is a unit vector) will be unity. 
That is the deeper significance of an orthogonal matrix S; the jv element of SS" is the 
dot product formed from the jth row of S and the vth column of S? (which is the same as 
the vth row of S). Since these row vectors are orthogonal, we will get zero if u 4 v, and 
because they are unit vectors, we will get unity if jz = v. In other words, SS? will be a unit 
matrix. 

Before leaving Eq. (3.28), note that its columns also have a simple interpretation: Each 
contains the components (in the primed coordinates) of one of the unit vectors of the 
unprimed set. Thus the dot product formed from two different columns of S will van- 
ish, while the dot product of any column with itself will be unity. This corresponds to the 
fact that, for an orthogonal matrix, we also have S'S = 1. 

Summarizing part of the above, 


The transformation from one orthogonal Cartesian coordinate system to another Carte- 
sian system is described by an orthogonal matrix. 


In Chapter 2 we found that an orthogonal matrix must have a determinant that is real 
and of magnitude unity, i.e., +1. However, for rotations in ordinary space the value of the 
determinant will always be +1. One way to understand this is to consider the fact that any 
rotation can be built up from a large number of small rotations, and that the determinant 
must vary continuously as the amount of rotation is changed. The identity rotation (i.e., 
no rotation at all) has determinant +1. Since no value close to +1 except +1 itself is a 
permitted value for the determinant, rotations cannot change the value of the determinant. 
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Reflections 


Another possibility for changing a coordinate system is to subject it to a reflection 
operation. For simplicity, consider first the inversion operation, in which the sign of each 
coordinate is reversed. In IR*, the transformation matrix S will be the 3 x 3 analog of 
Eq. (3.28), and the transformation under discussion is to set €, = —€,,, with uw =x, y, 
and z. This will lead to 


-1 0 O 
S=] 0 -1l 0], 
0 oO -1 
which clearly results in detS = —1. The change in sign of the determinant corresponds 


to the change from a right-handed to a left-handed coordinate system (which obviously 
cannot be accomplished by a rotation). Reflection about a plane (as in the image produced 
by a plane mirror) also changes the sign of the determinant and the handedness of the 
coordinate system; for example, reflection in the xy-plane changes the sign of é,, leaving 
the other two unit vectors unchanged; the transformation matrix S for this transformation is 


1 0 0O 
S=|0 1 0 
0 Oo -l 


Its determinant is also —1. 

The formulas for vector addition, multiplication by a scalar, and the dot product are 
unaffected by a reflection transformation of the coordinates, but this is not true of the cross 
product. To see this, look at the formula for any one of the components of A x B, and how 
it would change under inversion (where the same, unchanged vectors in physical space 
now have sign changes to all their components): 


Ce Ak AB, > (—Ay)(—B,) — (—A;)(—By) = AyB, — A;By. 





Note that this formula says that the sign of C, should not change, even though it must in 
order to describe the unchanged physical situation. The conclusion is that our transforma- 
tion law fails for the result of a cross-product operation. However, the mathematics can 
be salvaged if we classify B x C as a different type of quantity than B and C. Many texts 
on vector analysis call vectors whose components change sign under coordinate reflec- 
tion polar vectors, and those whose components do not then change sign axial vectors. 
The term axial doubtless arises from the fact that cross products frequently describe phe- 
nomena associated with rotation about the axis defined by the axial vector. Nowadays, it 
is becoming more usual to call polar vectors just vectors, because we want that term to 
describe objects that obey for all S the transformation law 


A’=SA_ (vectors), (3.29) 


(and specifically without a restriction to S whose determinants are +1). Axial vectors, for 
which the vector transformation law fails for coordinate reflections, are then referred to 
as pseudovectors, and their transformation law can be expressed in the somewhat more 
complicated form 


C’ =det(S)SC_ (pseudovectors). (3.30) 
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FIGURE 3.6 Inversion (right) of original coordinates (left) and the effect 
on a vector A and a pseudovector B. 


The effect of an inversion operation on a coordinate system and on a vector and a pseu- 
dovector are shown in Fig. 3.6. 

Since vectors and pseudovectors have different transformation laws, it is in general with- 
out physical meaning to add them together.? It is also usually meaningless to equate quan- 
tities of different transformational properties: in A = B, both quantities must be either 
vectors or pseudovectors. 

Pseudovectors, of course, enter into more complicated expressions, of which an example 
is the scalar triple product A- B x C. Under coordinate reflection, the components of B x C 
do not change (as observed earlier), but those of A are reversed, with the result that 
A-B x C changes sign. We therefore need to reclassify it as a pseudoscalar. On the 
other hand, the vector triple product, A x (B x C), which contains two cross products, 
evaluates, as shown in Eq. (3.18), to an expression containing only legitimate scalars and 
(polar) vectors. It is therefore proper to identify A x (B x C) as a vector. These cases 
illustrate the general principle that a product with an odd number of pseudo quantities is 
“pseudo,” while those with even numbers of pseudo quantities are not. 


Successive Operations 
One can carry out a succession of coordinate rotations and/or reflections by applying the 
relevant orthogonal transformations. In fact, we already did this in our introductory discus- 


sion for IR? where we applied a rotation and then its inverse. In general, if R and R’ refer to 
such operations, the application to A of R followed by the application of R’ corresponds to 


A’ =S(R)S(R)A, (3.31) 


and the overall result of the two transformations can be identified as a single transformation 
whose matrix S(R’R) is the matrix product S(R’)S(R). 





3The big exception to this is in beta-decay weak interactions. Here the universe distinguishes between right- and left-handed 
systems, and we add polar and axial vector interactions. 
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Exercises 


3.3.1 


3.3.2 


3.3.3 


3.3.4 


3.3.5 


Two points should be noted: 


The operations take place in right-to-left order: The rightmost operator is the one 
applied to the original A; that to its left then applies to the result of the first opera- 
tion, etc. 

The combined operation R’R is a transformation between two orthogonal coordinate 
systems and therefore can be described by an orthogonal matrix: The product of two 
orthogonal matrices is orthogonal. 


A rotation g; + ¢2 about the z-axis is carried out as two successive rotations gy; and 
92, each about the z-axis. Use the matrix representation of the rotations to derive the 
trigonometric identities 


cos(~1 + G2) = COS Y} COS G2 — Sing; sing, 
sin(g, + 2) = sing] COS Y2 + COS GY] SING. 


A corner reflector is formed by three mutually perpendicular reflecting surfaces. Show 
that a ray of light incident upon the corner reflector (striking all three surfaces) is 
reflected back along a line parallel to the line of incidence. 

Hint. Consider the effect of a reflection on the components of a vector describing the 
direction of the light ray. 


Let x and y be column vectors. Under an orthogonal transformation S, they become 
x’ = Sx and y’ = Sy. Show that (x’)”y’ = x’y, a result equivalent to the invariance of 
the dot product under a rotational transformation. 


Given the orthogonal transformation matrix S and vectors a and b, 


0.80 0.60 0.00 1 0 
S=|[-048 064 0.60], a=|]0], b=] 2], 
0.36 —0.48 0.80 1 -1 


(a) Calculate det(S). 

(b) Verify that a - b is invariant under application of S to a and b. 

(c) Determine what happens to a x b under application of S to a and b. Is this what 
is expected? 


Using a and b as defined in Exercise 3.3.4, but with 


0.60 0.00 0.80 2 
S=|-0.64 —0.60 0.48 and c=] 1}, 
—0.48 0.80 0.36 3 


(a) Calculate det(S). 
Apply S to a, b, and c, and determine what happens to 
(b) axb, 
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(c) (axb)-e, 

(d) ax(bxec). 

(e) Classify the expressions in (b) through (d) as scalar, vector, pseudovector, or pseu- 
doscalar. 


3.4 ROTATIONS IN R? 


Because of its practical importance, we discuss now in some detail the treatment of 
rotations in IR?. An obvious starting point, based on our experience in IR?, would be to 
write the 3 x 3 matrix S of Eq. (3.28), with rows that describe the orientations of a rotated 
(primed) set of unit vectors in terms of the original (unprimed) unit vectors: 


ee: €)-@2 ef -& 
S=]6-6 @&-& €-é (3.32) 
@,-€; @5-€ @,- 63 
We have switched the coordinate labels from x, y, z to 1, 2, 3 for convenience in some of 
the formulas that use Eq. (3.32). It is useful to make one observation about the elements 
of S, namely sy = e., -@,. This dot product is the projection of é, onto the é, direction, 
and is therefore the change in x, that is produced by a unit change in tis Since the relation 
between the coordinates is linear, we can identify e, -@) as Ox, / OX so our transformation 
matrix S can be written in the alternate form 
Ax, /Ox, 9x2/dx, 9x3/dx} 
S=| dx1/dx, Ox2/dx5 0x3/dx5 |. (3.33) 
0x1/0x, OXx2/0x, 9x3/Ix4 
The argument we made to evaluate e, - @, could as easily have been made with the roles 


of the two unit vectors reversed, yielding instead of dx,,/ Ox), the derivative dxi,/ oxy. We 
then have what at first may seem to be a surprising result: 








Ox’ 
OFF a (3.34) 
OX), OXy 


A superficial look at this equation suggests that its two sides would be reciprocals. The 
problem is that we have not been notationally careful enough to avoid ambiguity: the 
derivative on the left-hand side is to be taken with the other x’ coordinates fixed, while that 
on the right-hand side is with the other unprimed coordinates fixed. In fact, the equality in 
Eq. (3.34) is needed to make S an orthogonal matrix. 

We note in passing that the observation that the coordinates are related linearly restricts 
the current discussion to Cartesian coordinate systems. Curvilinear coordinates are treated 
later. 

Neither Eq. (3.32) nor Eq. (3.33) makes obvious the possibility of relations among the 
elements of S. In R*, we found that all the elements of S depended on a single variable, 
the rotation angle. In IR?, the number of independent variables needed to specify a general 
rotation is three: Two parameters (usually angles) are needed to specify the direction of 
é’,; then one angle is needed to specify the direction of é/, in the plane perpendicular to é5; 
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at this point the orientation of @, is completely determined. Therefore, of the nine elements 
of S, only three are in fact independent. The usual parameters used to specify IR? rotations 
are the Euler angles.’ It is useful to have S given explicitly in terms of them, as the 
Lagrangian formulation of mechanics requires the use of a set of independent variables. 

The Euler angles describe an RR? rotation in three steps, the first two of which have 
the effect of fixing the orientation of the new @3 axis (the polar direction in spherical 
coordinates), while the third Euler angle indicates the amount of subsequent rotation about 
that axis. The first two steps do more than identify a new polar direction; they describe 
rotations that cause the realignment. As a result, we can obtain the matrix representations 
of these (and the third rotation), and apply them sequentially (i.e., as a matrix product) to 
obtain the overall effect of the rotation. 

The three steps describing rotation of the coordinate axes are the following (also illus- 
trated in Fig. 3.7): 


1. The coordinates are rotated about the €3 axis counterclockwise (as viewed from posi- 
tive é3) through an angle a in the range 0 < a < 27, into new axes denoted e/, €5, 5. 
(The polar direction is not changed; the é3 and @, axes coincide.) 

2. The coordinates are rotated about the @), axis counterclockwise (as viewed from posi- 
tive &,) through an angle f in the range 0 < 6 < , into new axes denoted €/, €5, 3. 
(This tilts the polar direction toward the é} direction, but leaves @, unchanged.) 

3. The coordinates are now rotated about the e5 axis counterclockwise (as viewed from 
positive @) through an angle y in the range 0 < y < 2z, into the final axes, denoted 


ef’, €5’, €3’. (This rotation leaves the polar direction, €5, unchanged.) 


In terms of the usual spherical polar coordinates (r, 0, @), the final polar axis is at the 
orientation 6 = 6, y =a. The final orientations of the other axes depend on all three Euler 
angles. 

We now need the transformation matrices. The first rotation causes é) and @, to 
remain in the xy-plane, and has in its first two rows and columns exactly the same form 





FiGuRE 3.7 Euler angle rotations: (a) about €3 through angle a; (b) about é, through 
angle B; (c) about €5 through angle y. 


4There are almost as many definitions of the Euler angles as there are authors. Here we follow the choice generally made by 
workers in the area of group theory and the quantum theory of angular momentum. 
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as S in Eq. (3.25): 


cosa sina O 
Si(a)=]|—-sina cosa 0}. (3.35) 
0 0 1 


The third row and column of S; indicate that this rotation leaves unchanged the €3 com- 
ponent of any vector on which it operates. The second rotation (applied to the coordinate 
system as it exists after the first rotation) is in the é,é plane; note that the signs of sin B 
have to be consistent with a cyclic permutation of the axis numbering: 


cos6B O —sinB 
S2(A)=| 0 1 0 
sinB OQ cosB 
The third rotation is like the first, but with rotation amount y: 
cosy siny 0 
S3(vy)= | —siny cosy 0 
0 0 1 


The total rotation is described by the triple matrix product 


S(@, B, ¥) = $3(v)S2(B)S1(@). (3.36) 

Note the order: S;(@) operates first, then S2(8), and finally S3(y). Direct multiplication 
gives 

S(a, B, y) = 
cos y cos B cosa — sin y sina cosy cosBsina+sinycosa  —cosysinB 
—siny cosBcosa —cosy sina —sinycosBsina+cosycosa  siny sinB 
sin B cosa sin B sina cos B 
(3.37) 


In case they are wanted, note that the elements s;; in Eq. (3.37) give the explicit forms of 
the dot products é/” - é; (and therefore also the partial derivatives 0.x; / ax’). 
Note that each of S;, S2, and S3 are orthogonal, with determinant +1, so that the overall 


S will also be orthogonal with determinant +1. 


Example 3.4.1. AN IR? RotaTION 


Consider a vector originally with components (2,—1,3). We want its components in 
a coordinate system reached by Euler angle rotations aw = 6 = y = 7/2. Evaluating 


S(a, B, y): 
-1 0 0 
S(@,B,y)=} 0 0 1 
0 1 0 


A partial check on this value of S is obtained by verifying that det(S) = +1. 
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Then, in the new coordinates, our vector has components 
-1 0 0 2 —2 
00 1 -1]= 3 
0 1 0 3 —1 


The reader should check this result by visualizing the rotations involved. a 


Exercises 


3.4.1 


3.4.2 


3.4.3 


3.4.4 


3.4.5 


Another set of Euler rotations in common use is 


(1) arotation about the x3-axis through an angle g, counterclockwise, 
2) arotation about the x’ -axis through an angle 6, counterclockwise, 
1 g g 
(3) a rotation about the x4-axis through an angle y, counterclockwise. 


If 


a=g-—m/2 g=a 
B=0 or é=6 
y=wt+n/2 poy 


show that the final systems are identical. 


Suppose the Earth is moved (rotated) so that the north pole goes to 30° north, 20° west 
(original latitude and longitude system) and the 10° west meridian points due south 
(also in the original system). 


(a) What are the Euler angles describing this rotation? 
(b) Find the corresponding direction cosines. 


0.9551 —0.2552 —0.1504 
ANS. (b) S={0.0052 0.5221 —0.8529 
0.2962 0.8138 0.5000 


Verify that the Euler angle rotation matrix, Eq. (3.37), is invariant under the transfor- 
mation 





acat+nz, Bpo-p, yoy-n. 


Show that the Euler angle rotation matrix S(q, 8, y) satisfies the following relations: 


(a) Sl, B,y) =S(a, By), 
(b) S-l@, 8, vy) =S(-y, —B, —a). 


The coordinate system (x, y, z) is rotated through an angle ® counterclockwise about an 
axis defined by the unit vector fi into system (x’, y’, z’). In terms of the new coordinates 
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the radius vector becomes 


r =rcos®@+rxnsin®+ A(-r)(1 —cos®). 


(a) Derive this expression from geometric considerations. 


(b) Show that it reduces as expected for i = é,. The answer, in matrix form, appears 
in Eq. (3.35). 
(c) Verify that r’? = r?. 


3.5 DIFFERENTIAL VECTOR OPERATORS 


We move now to the important situation in which a vector is associated with each point 
in space, and therefore has a value (its set of components) that depends on the coordinates 
specifying its position. A typical example in physics is the electric field E(x, y, z), which 
describes the direction and magnitude of the electric force if a unit “test charge” was placed 
at x, y, z. The term field refers to a quantity that has values at all points of a region; if the 
quantity is a vector, its distribution is described as a vector field. While we already have 
a standard name for a simple algebraic quantity which is assigned a value at all points of 
a spatial region (it is called a function), in physics contexts it may also be referred to as a 
scalar field. 

Physicists need to be able to characterize the rate at which the values of vectors (and also 
scalars) change with position, and this is most effectively done by introducing differential 
vector operator concepts. It turns out that there are a large number of relations between 
these differential operators, and it is our current objective to identify such relations and 
learn how to use them. 


Gradient, V 


Our first differential operator is that known as the gradient, which characterizes the change 
of a scalar quantity, here yg, with position. Working in R?, and labeling the coordinates x1, 
x2, X3, we write g(r) as the value of ¢ at the point r = xj, + x2) + 1363, and consider 
the effect of small changes dx,, dx2, dx3, respectively, in x1, x2, and x3. This situation 
corresponds to that discussed in Section 1.9, where we introduced partial derivatives to 
describe how a function of several variables (there x, y, and z) changes its value when these 
variables are changed by respective amounts dx, dy, and dz. The equation governing this 
process is Eq. (1.141). 
To first order in the differentials dx;, y in our present problem changes by an amount 


a a i) 
deal — \ aa + | = \de 4 | = lade, (3.38) 
Ox] 0x2 0X3 
which is of the form corresponding to the dot product of 
d9/0X1 dx 


Vo= | 0¢/dx2 and dr= | dx2 
d~/0x3 dx3 
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These quantities can also be written 


F) . r) : 0 se 

os(— aa — lee (== le, (3.39) 
0x1 0x2 0x3 

dy = dx \e + dxr@o + dx3e3, (3.40) 


in terms of which we have 


dy = (V9) -dr. (3.41) 


We have given the 3 x 1 matrix of derivatives the name V¢ (often referred to in speech as 
“del phi” or “grad phi’); we give the differential of position its customary name dr. 

The notation of Eqs. (3.39) and (3.41) is really only appropriate if Vg is actually a 
vector, because the utility of the present approach depends on our ability to use it in coor- 
dinate systems of arbitrary orientation. To prove that V¢ is a vector, we must show that it 
transforms under rotation of the coordinate system according to 


(Ve) =S(V¢@). (3.42) 
Taking S in the form given in Eq. (3.33), we examine S(Vq). We have 


Ax1/Ox, 9x2/Ix,  9x3/0x,\ (ap/ax 
S(V@) = | 0x1 /0x5  Ox2/dx, 0x3/dx4 d0/dx2 
0x1/0x, Ox2/0x, 0x3/0x3 dp/0x3 


3 


9 Oxy O@ 


i 
= Ox} Oxy 
3 


= eee 0p | (3.43) 


/ 
= OX, OXy 
3 


> Oxy OP 


/ 
A OX OX, 




















Each of the elements in the final expression in Eq. (3.43) is a chain-rule expression for 
dg/ Oxia: i = 1, 2,3, showing that the transformation did produce (V¢)’, the representa- 
tion of V¢ in the rotated coordinates. 

Having now established the legitimacy of the form Vg, we proceed to give V a life of 
its own. We therefore define (calling the coordinates x, y, z) 


0 
az 





3 
V=&— +8, (3.44) 
X 


+é 
0 y . 


a 
We note that V is a vector differential operator, capable of operating on a scalar (such 
as gy) to produce a vector as the result of the operation. Because a differential operator 
only operates on what is to its right, we have to be careful to maintain the correct order in 
expressions involving V, and we have to use parentheses when necessary to avoid ambi- 
guity as to what is to be differentiated. 
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The gradient of a scalar is extremely important in physics and engineering, as it 
expresses the relation between a force field F(r) experienced by an object at r and the 
related potential V(r), 


Fir) = —-VV(). (3.45) 


The minus sign in Eq. (3.45) is important; it causes the force exerted by the field to be in 
a direction that lowers the potential. We consider later (in Section 3.9) the conditions that 
must be satisfied if a potential corresponding to a given force can exit. 

The gradient has a simple geometric interpretation. From Eq. (3.41), we see that, if 
ar is constrained to have a fixed magnitude, the direction of dr that maximizes dg will 
be when Vg and dr are collinear. So, the direction of most rapid increase in ¢ is the 
gradient direction, and the magnitude of the gradient is the directional derivative of g in 
that direction. We now see that — VV, in Eq. (3.45), is the direction of most rapid decrease 
in V, and is the direction of the force associated with the potential V. 


Example 3.5.1 Gravientorr” 


As a first step toward computation of Vr”, let’s look at the even simpler Vr. We begin by 
writing r = (x? + y? + z*)!/?, from which we get 








a a a 
2 a oR eee, (3.46) 
ax (x2 +y242z2)!/2 + oy r oz or 
From these formulas we construct 
X, Vx zy 1. x n r 
Vr = —e, + —ey + —e, = —(xey + yey + ze,) = -. (3.47) 
r r r r r 


The result is a unit vector in the direction of r, denoted r. For future reference, we note 
that 


es ee eee (3.48) 
r r r 


and that Eq. (3.47) takes the form 


< 
~ 
ll 
=> 


(3.49) 


The geometry of r and f is illustrated in Fig. 3.8. 











FiGurE 3.8 Unit vector f (in xy-plane). 
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Continuing now to Vr”, we have 





or” = pio or 
ax ax’ 
with corresponding results for the y and z derivatives. We get 
Vr" =nr"!Vr =nr"!P. (3.50) 


Example 3.5.2 — CouLome’s Law 


In electrostatics, it is well known that a point charge produces a potential proportional 
to 1/r, where r is the distance from the charge. To check that this is consistent with the 
Coulomb force law, we compute 
(") 
F=-V{-). 
r 


This is a case of Eq. (3.50) with n = —1, and we get the expected result 
1 


= —F. 
r 


Example 3.5.3 GENERAL RADIAL POTENTIAL 


Another situation of frequent occurrence is that the potential may be a function only of the 
radial distance from the origin, i.e., g = f(r). We then calculate 


ag _ df(r) ar 


Ox dr dx’ mes 
which leads, invoking Eq. (3.49), to 


_ af 





Dyes Os, 





Vv 3.51 
. dr dr ( ) 
This result is in accord with intuition; the direction of maximum increase in g must be 
radial, and numerically equal to dg/dr. | 


Divergence, V- 


The divergence of a vector A is defined as the operation 


dAx,  OAy OA, 
V-A= ~+——, 3.52 
ax s ay Oz ( ) 
The above formula is exactly what one might expect given both the vector and differential- 
operator character of V. 
After looking at some examples of the calculation of the divergence, we will discuss its 
physical significance. 
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Example 3.5.4 —DiverRGENCE OF COORDINATE VECTOR 


Calculate V -r: 





, oO , oO , 0 m “ x 
V-r= (a +é,— +6é, ) (6.x + yy + 6,2) 
Ox dy az 
~ ax ay az’ 
which reduces to V -r=3. | 


Example 3.5.5 — DiveRGENCE OF CENTRAL FORCE FIELD 


Consider next V - f(r)r. Using Eq. (3.48), we write 


V-f()t= (a a es a Se a ) (“as IOs 4 10%.). 
ax Oy Oz r r r 


_ 0 (xf) d (yf) 0 (z2fr) 
== r )+a( r )+F( r ) 














Using 
ex). 70). sf) or , Raf) oF 1 x? x? df (r) 
= ( r )- a r2 ie dr ~=Fe0|- =| r2 dr 


and corresponding formulas for the y and z derivatives, we obtain after simplification 





: d 
yfore 229 (3.53) 
r dr 

In the special case f(r) =r”, Eq. (3.53) reduces to 

V-r'B=(n4+2)r"!, (3.54) 
For n = 1, this reduces to the result of Example 3.5.4. For n = —2, corresponding to the 
Coulomb field, the divergence vanishes, except at r = 0, where the differentiations we 
performed are not defined. a 


If a vector field represents the flow of some quantity that is distributed in space, its 
divergence provides information as to the accumulation or depletion of that quantity at the 
point at which the divergence is evaluated. To gain a clearer picture of the concept, let us 
suppose that a vector field v(r) represents the velocity of a fluid® at the spatial points r, 
and that p(r) represents the fluid density at r at a given time f. Then the direction and 
magnitude of the flow rate at any point will be given by the product p(r)v(r). 

Our objective is to calculate the net rate of change of the fluid density in a volume 
element at the point r. To do so, we set up a parallelepiped of dimensions dx, dy, dz 
centered at r and with sides parallel to the xy, xz, and yz planes. See Fig. 3.9. To first 
order (infinitesimal dr and dt), the density of fluid exiting the parallelepiped per unit time 


Sit may be helpful to think of the fluid as a collection of molecules, so the number per unit volume (the density) at any point is 
affected by the flow in and out of a volume element at the point. 
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—pVy| x-dxi2 ~< 








FiGURE 3.9 Outward flow of ov from a volume element in the +x directions. The 
quantities +pv, must be multiplied by dy dz to represent the total flux through the 
bounding surfaces at x + dx/2. 








through the yz face located at x — (dx /2) will be 


dx 
Flow out, face at x — — (pvx) are een dy dz. 

Note that only the velocity component v, is relevant here. The other components of v 
will not cause motion through a yz face of the parallelepiped. Also, note the following: 
dy dz is the area of the yz face; the average of pv, over the face is to first order its value at 
(x —dx/2, y, z), as indicated, and the amount of fluid leaving per unit time can be identified 
as that in a column of area dy dz and height v,. Finally, keep in mind that outward flow 
corresponds to that in the —x direction, explaining the presence of the minus sign. 

We next compute the outward flow through the yz planar face at x + dx /2. The result is 





d 
Flow out, face at x + a + (pvx) dy dz. 


(x+dx/2,y,z) 
Combining these, we have for both yz faces 


= d(pvx) 
(torn ean) e=( Ox Javay dc. 


Note that in combining terms at x — dx/2 and x + dx/2 we used the partial derivative 
notation, because all the quantities appearing here are also functions of y and z. Finally, 
adding corresponding contributions from the other four faces of the parallelepiped, we 
reach 


x—dx/2 + (ox) 











Net flow out _ | 9 a a 
per unit time = | (or) Bi ay Py) + 9p Pre) dx dy dz 
=V -(pv)dxdydz. (3.55) 


We now see that the name divergence is aptly chosen. As shown in Eq. (3.55), the 
divergence of the vector pv represents the net outflow per unit volume, per unit time. If 
the physical problem being described is one in which fluid (molecules) are neither created 
or destroyed, we will also have an equation of continuity, of the form 


dp 

ar 

This equation quantifies the obvious statement that a net outflow from a volume element 
results in a smaller density inside the volume. 

When a vector quantity is divergenceless (has zero divergence) in a spatial region, we 

can interpret it as describing a steady-state “fluid-conserving” flow (flux) within that region 


+V- (pv) =0. (3.56) 
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FIGURE 3.10 Flow diagrams: (a) with source and sink; (b) solenoidal. The divergence 
vanishes at volume elements A and C, but is negative at B. 


(even if the vector field does not represent material that is moving). This is a situation that 
arises frequently in physics, applying in general to the magnetic field, and, in charge-free 
regions, also to the electric field. If we draw a diagram with lines that follow the flow paths, 
the lines (depending on the context) may be called stream lines or lines of force. Within a 
region of zero divergence, these lines must exit any volume element they enter; they cannot 
terminate there. However, lines will begin at points of positive divergence (sources) and 
end at points where the divergence is negative (sinks). Possible patterns for a vector field 
are shown in Fig. 3.10. 

If the divergence of a vector field is zero everywhere, its lines of force will consist 
entirely of closed loops, as in Fig. 3.10(b); such vector fields are termed solenoidal. For 
emphasis, we write 


V -B=0 everywhere -—> Bis solenoidal. (3.57) 


Curl, V x 


Another possible operation with the vector operator V is to take its cross product with a 
vector. Using the established formula for the cross product, and being careful to write the 
derivatives to the left of the vector on which they are to act, we obtain 


VxV=é ey a +6€ ay Oy, +é oy oy 
hoe ee Ne. ie ge ae 
é& € & 
=|d/dx d/dy 0/dz). (3.58) 
Vi Vy Vs 
This vector operation is called the curl of V. Note that when the determinant in Eq. (3.58) 
is evaluated, it must be expanded in a way that causes the derivatives in the second row to 
be applied to the functions in the third row (and not to anything in the top row); we will 


encounter this situation repeatedly, and will identify the evaluation as being from the top 
down. 


Example 3.5.6 CURL OF A CENTRAL FORCE FIELD 


Calculate V x [f(r)r]. Writing 
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and remembering that dr/dy = y/r and dr/dz = z/r, the x-component of the result is 
found to be 


dzf(r) 9 yf(r) 
oy r oz or 


_ ( a) or (< a) or 
a dr r dy - dr r Oz 
me 1D) € £029 
—“\ dt or r 7 dr r 7 


By symmetry, the other components are also zero, yielding the final result 





[Vv x [f@)F],. = 








V x [f(r] =0. (3.59) 


Example 3.5.7 |= ANONZERO CURL 


Calculate F = V x (—yé, + x@y), which is of the form V x b, where b, = —y, by = x, 








b, =0. We have 
he a. aay Be 
dy Oz Oz ax ax dy 
so F = 2€,. | 


The results of these two examples can be better understood from a geometric interpreta- 
tion of the curl operator. We proceed as follows: Given a vector field B, consider the line 
integral ¢ B - ds for a small closed path. The circle through the integral sign is a signal 
that the path is closed. For simplicity in the computations, we take a rectangular path in 
the xy-plane, centered at a point (xo, yo), of dimensions Ax x Ay, as shown in Fig. 3.11. 
We will traverse this path in the counterclockwise direction, passing through the four seg- 
ments labeled | through 4 in the figure. Since everywhere in this discussion z = 0, we do 
not show it explicitly. 


y 
A 


2 


Xg—AX, Yot Ay Xgt Ax, Yot Ay 
5 9 : 


Xp+AX, Yo-AY 
2 2 











FiGure 3.11 Path for computing circulation at (x0, yo). 
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Segment | of the path contributes to the integral 
xXo+Ax /2 
Segment | = / By (x, yo — Ay/2) dx © By (xo, yo — Ay/2)Ax, 
xg—Ax/2 


where the approximation, replacing B, by its value at the middle of the segment, is good 
to first order. In a similar fashion, we have 


yotAy/2 
Segment 2 = / By(xo + Ax/2, y) dy © By (xp + Ax/2, yo)Ay, 
yo-Ay/2 
xq—Ax/2 
Segment 3 = / By (x, yo + Ay/2) dx © —B, (x0, yo + Ay/2) Ax, 
xotAx/2 
yo— Ay/2 
Segment 4 = / By(xo — Ax/2, y)dy © —By (x9 — Ax/2, yo) Ay. 
yotAy/2 
Note that because the paths of segments 3 and 4 are in the direction of decrease in the value 


of the integration variable, we obtain minus signs in the contributions of these segments. 
Combining the contributions of Segments | and 3, and those of Segments 2 and 4, we have 


OB, 
Segments 1 + 3 = (By (x0, yo — Ay/2) — By (xo, yo + Ay/2)) Ax © —5 AyAs, 
y 


aBy 
Segments 2 + 4 = (By (xo + Ax/2, yo) — By(xo — Ax/2, yo)) Ay © + - 2 





Ax Ay. 
x 


Combining these contributions to obtain the value of the entire line integral, we have 





§ ( OBy dB, ) 
B- ds | — —- AxAy © [V x B],AxAy. (3.60) 
ox dy 

The thing to note is that a nonzero closed-loop line integral of B corresponds to a nonzero 
value of the component of V x B normal to the loop. In the limit of a small loop, the line 
integral will have a value proportional to the loop area; the value of the line integral per 
unit area is called the circulation (in fluid dynamics, it is also known as the vorticity). 
A nonzero circulation corresponds to a pattern of stream lines that form closed loops. 
Obviously, to form a closed loop, a stream line must curl; hence the name of the V x 
operator. 

Returning now to Example 3.5.6, we have a situation in which the lines of force must 
be entirely radial; there is no possibility to form closed loops. Accordingly, we found this 
example to have a zero curl. But, looking next at Example 3.5.7, we have a situation in 
which the stream lines of — yé; + xé, form counterclockwise circles about the origin, and 
the curl is nonzero. 
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3.5.1 


Chapter 3 Vector Analysis 


We close the discussion by noting that a vector whose curl is zero everywhere is termed 


irrotational. This property is in a sense the opposite of solenoidal, and deserves a parallel 
degree of emphasis: 


V x B=0 everywhere -—> Bis irrotational. (3.61) 


If S(x,y, = (22 +y? +22) find 


(a) VS at the point (1, 2,3), 
(b) the magnitude of the gradient of S, |V S| at (1, 2, 3), and 
(c) the direction cosines of VS at (1, 2,3). 


(a) Find a unit vector perpendicular to the surface 
xr +y?+77=3 


at the point (1, 1, 1). 
(b) Derive the equation of the plane tangent to the surface at (1, 1, 1). 


ANS. (a) (@ +6) +8) /V3, (b+) xt+y4+z=3. 


Given a vector rj2 = @, (x1 — x2) + @y (1 — y2) + €,(z1 — 22), show that V 112 (gradient 
with respect to x, yi, and z; of the magnitude rj2) is a unit vector in the direction of 
Y12. 

If a vector function F depends on both space coordinates (x, y, z) and time t, show that 
oF 
Ot 


Show that V(uv) = vVu + uVv, where u and v are differentiable scalar functions of 
x,y, and z. 


dF = (dr -V)F + —dt. 


For a particle moving in a circular orbit r = é,r cos wt + @yr sinat: 


(a) Evaluate r x r, with r= dr/dt =v. 
(b) Show that #+ wr = 0 with ¥ = dv/dt. 


Hint. The radius r and the angular velocity w are constant. 
ANS. (a) €or’. 


Vector A satisfies the vector transformation law, Eq. (3.26). Show directly that its time 
derivative dA/dt also satisfies Eq. (3.26) and is therefore a vector. 


Show, by differentiating components, that 


(a) g (A -B) al B+A a 
a — @ => —. i—, 
dt dt dt 





3.5.9 


3.5.10 


3.5.11 


3.5.12 


3.5.13 
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d dA dB 
— (A x B) = — x B+ A x — 
(b) dt | a, at * = ge 


just like the derivative of the product of two algebraic functions. 


Prove V - (ax b)=b-(V x a)—a-(V xb). 
Hint. Treat as a scalar triple product. 


Classically, orbital angular momentum is given by L=r x p, where p is the lin- 
ear momentum. To go from classical mechanics to quantum mechanics, p is replaced 
(in units with A = 1) by the operator —iV. Show that the quantum mechanical angular 
momentum operator has Cartesian components 


. r) r) 

Peta Fay 
~) 

x 

Oz 


ee ee: ‘) 
=-—i(x——y— }. 
: ay ° ox 


Using the angular momentum operators previously given, show that they satisfy com- 
mutation relations of the form 


’ 
’ 


[SONS a Oe Deen OT seme? 





and hence 
Lx L=iL. 


These commutation relations will be taken later as the defining relations of an angular 
momentum operator. 


With the aid of the results of Exercise 3.5.11, show that if two vectors a and b commute 
with each other and with L, that is, [a, b] = [a, L] = [b, L] = 0, show that 


[a-L,b-L] =i(ax b)-L. 





Prove that the stream lines of b in of Example 3.5.7 are counterclockwise circles. 


3.6 DIFFERENTIAL VECTOR OPERATORS: FURTHER 


PROPERTIES 


Successive Applications of V 
Interesting results are obtained when we operate with V on the differential vector operator 
forms we have already introduced. The possible results include the following: 


(a) V- Vo (b) V x Vo (c) V(V- V) 
(d)V-(VxV) (e)Vx(VxV). 
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All five of these expressions involve second derivatives, and all five appear in the 
second-order differential equations of mathematical physics, particularly in electromag- 
netic theory. 


Laplacian 


The first of these expressions, V - Vg, the divergence of the gradient, is named the 
Laplacian of g. We have 
~ oO , dO . ~a , 9 . OY a) 
V-Ve={ e ey e, -{e e, e 
? (ap+ap ter) (atratred 
op ap do 
= ; 3.62 

ax? a ay? - az? oe 

When ¢ is the electrostatic potential, we have 


V-Vo=0 (3.63) 











at points where the charge density vanishes, which is Laplace’s equation of electrostatics. 
Often the combination V - V is written V7, or A in the older European literature. 


Example 3.6.1 LAPLACIAN OF A CENTRAL FIELD POTENTIAL 


Calculate V7y(r). Using Eq. (3.51) to evaluate Vg and then Eq. (3.53) for the divergence, 
we have 
der). _ 2dg(r) te dor) 

dr r ar dr2 — 
We get a term in addition to d*y/dr* because é, has a direction that depends on r. 

In the special case g(r) =r”, this reduces to 

V2r" =n(n + I)r"?. 

This vanishes for n = 0 (y =constant) and for n = —1 (Coulomb potential). For n = —1, 
our derivation fails for r= 0, where the derivatives are undefined. 


Vor) =V-Vo(r)=V 





Irrotational and Solenoidal Vector Fields 


Expression (b), the second of our five forms involving two V operators, may be written as 
a determinant: 
ey ey e. ey ey e, 
VxV@g=|d/dx d/dy d/dz|=|d/dx d/dy d/dz)g=0. 
dp/dx dg/day dg/daz d/ax od/dy <d/dz 


Because the determinant is to be evaluated from the top down, it is meaningful to move 
y outside and to its right, leaving a determinant with two identical rows and yielding the 
indicated value of zero. We are thereby actually assuming that the order of the partial 
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differentiations can be reversed, which is true so long as these second derivatives of gy are 
continuous. 
Expression (d) is a scalar triple product that may be written 


d/ax d/dy d/dz 
V-(VxV)=J0/dx d/dy d/dz| =O. 
Vy Vy V. 


This determinant also has two identical rows and yields zero if V has sufficient continuity. 

These two vanishing results tell us that any gradient has a vanishing curl and is therefore 
irrotational, and that any curl has a vanishing divergence, and is therefore solenoidal. 
These properties are of such importance that we set them out here in display form: 


VxVg=0, allg, (3.64) 
V-(VxV)=0, all V. (3.65) 


Maxwell’s Equations 
The unification of electric and magnetic phenomena that is encapsulated in Maxwell’s 


equations provides an excellent example of the use of differential vector operators. In SI 
units, these equations take the form 


Vv -B=0, (3.66) 
yno (3.67) 
€0 
dE 
VxB= F0HO- + “od, (3.68) 
aB 
VxE=-—. (3.69) 
ot 


Here E is the electric field, B is the magnetic induction field, p is the charge density, J is 
the current density, ¢9 is the electric permittivity, and jo is the magnetic permeability, so 
0/49 = 1/c?, where c is the velocity of light. 


Vector Laplacian 


Expressions (c) and (e) in the list at the beginning of this section satisfy the relation 
Vx(VxV)=V(V-V)—-V-VV. (3.70) 


The term V - VV, which is called the vector Laplacian and sometimes written VV, has 
prior to this point not been defined; Eq. (3.70) (solved for V7V) can be taken to be its 
definition. In Cartesian coordinates, V2V is a vector whose i component is v2 V;, and that 
fact can be confirmed either by direct component expansion or by applying the BAC-CAB 
tule, Eq. (3.18), with care always to place V so that the differential operators act on it. 
While Eq. (3.70) is general, V7V separates into Laplacians for the components of V only 
in Cartesian coordinates. 





156 Chapter 3 Vector Analysis 
Example 3.6.2 — ELECTROMAGNETIC WAVE EQUATION 
Even in vacuum, Maxwell’s equations can describe electromagnetic waves. To derive an 


electromagnetic wave equation, we start by taking the time derivative of Eq. (3.68) for the 
case J = 0, and the curl of Eq. (3.69). We then have 


oy ee 

at ar?’ 
Vx(Vx E=--2Vv x way 
at at? 


We now have an equation that involves only E; it can be brought to a more convenient 
form by applying Eq. (3.70), dropping the first term on the right of that equation because, 
in vacuum, V - E = 0. The result is the vector electromagnetic wave equation for E, 
PE 10°E 
Or 2 ar” 

Equation (3.71) separates into three scalar wave equations, each involving the (scalar) 
Laplacian. There is a separate equation for each Cartesian component of E. | 


V-E= coo (3.71) 


Miscellaneous Vector Identities 
Our introduction of differential vector operators is now formally complete, but we present 


two further examples to illustrate how the relationships between these operators can be 
manipulated to obtain useful vector identities. 


Example 3.6.3 DIVERGENCE AND CURL OF A PRODUCT 


First, simplify V - (f V), where f and V are, respectively, scalar and vector functions. 
Working with the components, 


0 0 0 
V-(fV)= aq et ay FY + az bt Ye 








of aV, of av, of dV: 
=— V, Vy = V. 

ax ea der aaa er ue ere ar 
=(Vf)-V+fV-V. (3.72) 


Now simplify V x (f V). Consider the x-component: 
0 0 dV, dV, a 0 
——(f Vz) — —(fVy) = fl : :] +| iy 5]. 
dy Oz 





dy 0z dy az 
This is the x-component of f(V x V)+ (Vf) x V, so we have 
Vx(fW=f(VxW+(VF)x V. (3.73) 
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Example 3.6.4 GRADIENT OF A DOT PRODUCT 


Verify that 
V(A- B) = (B- V)A+ (A: V)B+ Bx (V x A) + Ax (V x B). (3.74) 


This problem is easier to solve if we recognize that V(A - B) is a type of term that appears 
in the BAC—CAB expansion of a vector triple product, Eq. (3.18). From that equation, 
we have 


Ax (V x B) = Vg(A-B) —(A-V)B, 


where we placed B at the end of the final term because V must act on it. We write Vg to 
indicate an operation our notation is not really equipped to handle. In this term, V acts only 
on B, because A appeared to its left on the left-hand side of the equation. Interchanging 
the roles of A and B, we also have 


Bx (V x A)=V4(A-B) —(B-V)A, 


where Vy, acts only on A. Adding these two equations together, noting that Vg + Va is 


simply an unrestricted V, we recover Eq. (3.74). a 
Exercises 
3.6.1 Show that u x v is solenoidal if u and v are each irrotational. 
3.6.2 If A is irrotational, show that A x r is solenoidal. 
3.6.3 A rigid body is rotating with constant angular velocity w. Show that the linear velocity 
v is solenoidal. 
3.6.4 If a vector function V(x, y, z) is not irrotational, show that if there exists a scalar func- 
tion g(x, y, z) such that gV is irrotational, then 
V-VxV=0. 
3.6.5 Verify the vector identity 
V x (Ax B) = (B- V)A— (A: V)B— B(V - A) + A(V -B). 
3.6.6 As an alternative to the vector identity of Example 3.6.4 show that 
V(A-B)=(A x V) x B+ (Bx V) x A+ A(V -B) + B(V - A). 
3.6.7 Verify the identity 
1 
Ax(VxA)= sv) —(A-V)A. 
3.6.8 If A and B are constant vectors, show that 


V(A-Bxr)=AxB. 
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3.6.9 


3.6.10 
3.6.11 


3.6.12 


3.6.13 
3.6.14 


3.6.15 


3.6.16 


3.6.17 


Verify Eq. (3.70), 

Vx(VxW=V(V-V)-V-VV, 
by direct expansion in Cartesian coordinates. 
Prove that V x (pVg) = 0. 


You are given that the curl of F equals the curl of G. Show that F and G may differ by 
(a) aconstant and (b) a gradient of a scalar function. 


The Navier-Stokes equation of hydrodynamics contains a nonlinear term of the form 
(v- V)v. Show that the curl of this term may be written as —V x [v x (V x v)]. 


Prove that (Vu) x (Vv) is solenoidal, where u and v are differentiable scalar functions. 


The function ¢ is a scalar satisfying Laplace’s equation, V7y = 0. Show that V¢ is 
both solenoidal and irrotational. 


Show that any solution of the equation 
Vx(V x A)—kKA=0 

automatically satisfies the vector Helmholtz equation 

V-A+KA=0 
and the solenoidal condition 

V-A=0. 

Hint. Let V - operate on the first equation. 
The theory of heat conduction leads to an equation 

VW =k|VO/’, 


where ® is a potential satisfying Laplace’s equation: V7 = 0. Show that a 
solution of this equation is W = k®*/2. 


Given the three matrices 


00 O 00 i 
M;=|]0O 0 -i], M,=] O O Of, 
0 i -i 0 0 
and 
0 -i O 
Mz={i O Of, 
0 oO 0 


show that the matrix-vector equation 


lo 
M-V+13;-— ]w=0 
( ay 
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reproduces Maxwell’s equations in vacuum. Here w is a column vector with compo- 
nents yj = B; —iEj;/c, j =x, y,z. Note that e9u9 = 1/c? and that 13 is the 3 x 3 unit 
matrix. 


3.6.18 | Using the Pauli matrices 0; of Eq. (2.28), show that 
(o -a)(o-b) = (a-b)1.+i0- (ax b). 
Here 
o =6,0| + €yo2 + €,03, 


a and b are ordinary vectors, and 19 is the 2 x 2 unit matrix. 


3.7 VECTOR INTEGRATION 


In physics, vectors occur in line, surface, and volume integrals. At least in principle, these 
integrals can be decomposed into scalar integrals involving the vector components; there 
are some useful general observations to make at this time. 


Line Integrals 


Possible forms for line integrals include the following: 


[ var. [F-ae, Jv x dr. (3.75) 
Cc Cc 


Cc 


In each of these the integral is over some path C that may be open (with starting and 
endpoints distinct) or closed (forming a loop). Inserting the form of dr, the first of these 
integrals reduces immediately to 


[odr=a [ow rodr+e [ocyody+e f oy 2d (3.76) 


Cc Cc Cc Cc 


The unit vectors need not remain within the integral beause they are constant in both mag- 
nitude and direction. 

The integrals in Eq. (3.76) are one-dimensional scalar integrals. Note, however, that 
the integral over x cannot be evaluated unless y and z are known in terms of x; similar 
observations apply for the integrals over y and z. This means that the path C must be 
specified. Unless g has special properties, the value of the integral will depend on the path. 

The other integrals in Eq. (3.75) can be handled similarly. For the second integral, which 
is of common occurrence, being that which evaluates the work associated with displace- 
ment on the path C, we have: 


w= fredr=[ Royodrt f Rerodyt f Renyoae. (3.77) 
Cc Cc Cc 


Cc 
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Example 3.7.1 LINE INTEGRALS 


We consider two integrals in 2-D space: 


Ic = [ oe. y)dr, with g(x, y)=1, 
Cc 


Jo= f Fl.y) ae, with F(x, y) = —yé, + x€y. 
Cc 


We perform integrations in the xy-plane from (0,0) to (1,1) by the two different paths 
shown in Fig. 3.12: 


Path C, is (0,0) > (1,0) > C1, 1), 
Path C2 is the straight line (0,0) > (1, 1). 


For the first segment of C;, x ranges from 0 to | while y is fixed at zero. For the second 
segment, y ranges from 0 to | while x = 1. Thus, 


1 1 1 
te =& f dxp(x.0) +8, f dye.» =& fax +8, fay = 8 +8, 
0 0 0 0 
1 1 1 1 


1 
Jc, = faxcro.o+ fayra.y= f= faxo+ fayay=t. 
0 0 0 0 0 


On Path 2, both dx and dy range from 0 to 1, with x = y at all points of the path. Thus, 
1 1 
Jey =a f dx 06. +8 f ayo.» =8 +8 
0 0 
1 1 1 1 
Je= f dxPe.xyt f ayy.» fax—0+ f dyoy=-5 +5 =0. 
0 0 0 0 


We see that integral J is independent of the path from (0,0) to (1,1), a nearly trivial special 
case, while the integral J is not. |_| 








FiGURE 3.12 Line integration paths. 
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FIGURE 3.13 Positive normal directions: left, disk; right, spherical surface with hole. 


Surface Integrals 


Surface integrals appear in the same forms as line integrals, the element of area being a 
vector, do, normal to the surface: 


[ ve. [yea [vxas. 


Often do is written ndA, where fi is a unit vector indicating the normal direction. There are 
two conventions for choosing the positive direction. First, if the surface is closed (has no 
boundary), we agree to take the outward normal as positive. Second, for an open surface, 
the positive normal depends on the direction in which the perimeter of the surface is tra- 
versed. Starting from an arbitrary point on the perimeter, we define a vector u to be in the 
direction of travel along the perimeter, and define a second vector v at our perimeter point 
but tangent to and lying on the surface. We then take u x v as the positive normal direction. 
This corresponds to a right-hand rule, and is illustrated in Fig. 3.13. It is necessary to define 
the orientation carefully so as to deal with cases such as that of Fig. 3.13, right. 

The dot-product form is by far the most commonly encountered surface integral, as it 
corresponds to a flow or flux through the given surface. 


Example 3.7.2 A SURFACE INTEGRAL 


Consider a surface integral of the form J = {,B-do over the surface of a tetrahe- 
dron whose vertices are at the origin and at the points (1,0,0), (0,1,0), and (0,0,1), with 
B= (x + Dé, + yey — ze. See Fig. 3.14. 

The surface consists of four triangles, which can be identified and their contributions 
evaluated, as follows: 


1. On the xy-plane (z = 0), vertices at (x, y) = (0,0), (1,0), and (0,1); direction of out- 
ward normal is —é,, so do = —é,dA (dA = element of area on this triangle). Here, 
B= (x + 1lé, + yey, and B- do =0. So there is no contribution to /. 

2. On the xz plane (y = 0), vertices at (x, z) = (0,0), (1,0), and (0,1); direction of out- 
ward normal is —é,, so do = —€,dA. On this triangle, B= (x + 1)é, — zé,, Again, 
B-do =0. There is no contribution to [. 

3. On the yz plane (x = 0), vertices at (y,z) = (0,0), (1,0), and (0,1); direction 
of outward normal is —é,, so do = —é€,dA. Here, B = @, + ye, — zé,, and 
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FiGuRE 3.14 Tetrahedron, and detail of the oblique face. 


B-do = (—1)dA; the contribution to J is —1 times the area of the triangle (=1/2), 
or 13 = —1/2. 

4. Obliquely oriented, vertices at (x, y, z) = (1,0,0), (0,1,0), (0,0,1); direction of out- 
ward normal is f = (€, + @y + é,)//3, and do = fd A. Using also B= (x + 1)é + 
yéy — zé;, this contribution to J becomes 


x+1l+y-z (eae 
al a ey Ye 
V3 V3 


A4 Ag 
where we have used the fact that on this triangle, x + y+z=1. 
To complete the evaluation, we note that the geometry of the triangle is as shown 
in Fig. 3.14, that the width of the triangle at height z is 2 (1 — z), and a change dz in 
z produces a displacement ./3/2dz on the triangle. I4 therefore can be written 


1 


> 2 
I4= | 20 —z) air 


0 


= dA, 





Combining the nonzero contributions /3 and J4, we obtain the final result 


= 1 21 
~ 2°36 


Volume Integrals 


Volume integrals are somewhat simpler, because the volume element dt is a scalar 
quantity. Sometimes dt is written d*r, or d*x when the coordinates were designated 
(x1, x2, x3). In the literature, the form dr is frequently encountered, but in contexts that 
usually reveal that it is a synonym for dt, and not a vector quantity. The volume integrals 
under consideration here are of the form 


[varaa f vars, [ yar se. f ved 


The integral reduces to a vector sum of scalar integrals. 
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Some volume integrals contain vector quantities in combinations that are actually scalar. 
Often these can be rearranged by applying techniques such as integration by parts. 


Example 3. 7.3 INTEGRATION BY PARTS 


Consider an integral over all space of the form if A(r)V - f(r)d?r in the frequently occur- 
ring special case in which either f or A vanish sufficiently strongly at infinity. Expanding 
the integrand into components, 


7 Ay 
[aw LV f(nyd’r = ff dydz ere = / re ax] i 034 


=-(rZ dxdyde~ {ff ¢ardyaz— ff faxdyde 


y 
She / f(r)V - A(n)d?r. (3.78) 











For example, if A = e'*<p describes a photon with a constant polarization vector in the 


direction p and w(r) is a bound-state wave function (so it vanishes at infinity), then 
d 





ik 
[eee vumar=-@-% [vo r= —ikip-&) f yonea'r 


Only the z-component of the gradient contributes to the integral. 
Analogous rearrangements (assuming the integrated terms vanish at infinity) include 


i fV- Ader = — / A(r)- Vf (n)d?r, (3.79) 


few : (V x A(r)d?r = [aw : (V x C(r))d?r. (3.80) 


In the cross-product example, the sign change from the integration by parts combines with 
the signs from the cross product to give the result shown. a 


Exercises 


3.7.1 


3.7.2 


The origin and the three vectors A, B, and C (all of which start at the origin) define a 
tetrahedron. Taking the outward direction as positive, calculate the total vector area of 
the four tetrahedral surfaces. 


Find the work ¢ F - dr done moving on a unit circle in the xy-plane, doing work against 
a force field given by 


ery eyx 
a x2 + y?2 x2 4 y? ‘ 





(a) Counterclockwise from 0 to z, 
(b) Clockwise from 0 to —z. 


Note that the work done depends on the path. 
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3.7.3 Calculate the work you do in going from point (1, 1) to point (3, 3). The force you exert 
is given by 


F=é(x —y)+@(x+y). 
Specify clearly the path you choose. Note that this force field is nonconservative. 


3.7.4 Evaluate f r - dr for a closed path of your choosing. 


3.7.5 Evaluate 
s| 
— | r-do 
3 


Ss 


over the unit cube defined by the point (0, 0,0) and the unit intercepts on the positive 
x-, y-, and z-axes. Note that r- do is zero for three of the surfaces and that each of the 
three remaining surfaces contributes the same amount to the integral. 


3.8 INTEGRAL THEOREMS 


The formulas in this section relate a volume integration to a surface integral on its boundary 
(Gauss’ theorem), or relate a surface integral to the line defining its perimeter (Stokes’ 
theorem). These formulas are important tools in vector analysis, particularly when the 
functions involved are known to vanish on the boundary surface or perimeter. 


Gauss’ Theorem 


Here we derive a useful relation between a surface integral of a vector and the volume 
integral of the divergence of that vector. Let us assume that a vector A and its first deriva- 
tives are continuous over a simply connected region of IR? (regions that contain holes, 
like a donut, are not simply connected). Then Gauss’ theorem states that 


fa-do= |v -Adr. (3.81) 
av Vv 


Here the notations V and dV respectively denote a volume of interest and the closed sur- 
face that bounds it. The circle on the surface integral is an additional indication that the 
surface is closed. 

To prove the theorem, consider the volume V to be subdivided into an arbitrary large 
number of tiny (differential) parallelepipeds, and look at the behavior of V - A for each. See 
Fig. 3.15. For any given parallelepiped, this quantity is a measure of the net outward flow 
(of whatever A describes) through its boundary. If that boundary is interior (i.e., is shared 
by another parallelepiped), outflow from one parallelepiped is inflow to its neighbor; in a 
summation of all the outflows, all the contributions of interior boundaries cancel. Thus, the 
sum of all the outflows in the volume will just be the sum of those through the exterior 
boundary. In the limit of infinite subdivision, these sums become integrals: The left-hand 
side of Eq. (3.81) becomes the total outflow to the exterior, while its right-hand side is the 
sum of the outflows of the differential elements (the parallelepipeds). 
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FIGURE 3.15 Subdivision for Gauss’ theorem. 


A simple alternate explanation of Gauss’ theorem is that the volume integral sums the 
outflows V - A from all elements of the volume; the surface integral computes the same 
thing, by directly summing the flow through all elements of the boundary. 

If the region of interest is the complete IR*, and the volume integral converges, the 
surface integral in Eq. (3.81) must vanish, giving the useful result 


/ V -Adt =0, integration over IR? and convergent. (3.82) 


Example 3.8.1 TETRAHEDRON 


We check Gauss’ theorem for a vector B= (x + 1)é, + yé, — zé,, comparing 


[vy Bac VS. [B-ae, 


Vv av 


where V is the tetrahedron of Example 3.7.2. In that example we computed the surface 
integral needed here, obtaining the value 1/6. For the integral over V, we take the diver- 
gence, obtaining V - B= 1. The volume integral therefore reduces to the volume of the 
tetrahedron that, with base of area 1/2 and height 1, has volume 1/3 x 1/2 x 1=1/6. 
This instance of Gauss’ theorem is confirmed. a 


Green’s Theorem 


A frequently useful corollary of Gauss’ theorem is a relation known as Green’s theorem. 
If u and v are two scalar functions, we have the identities 


V -(uVv) =uV7v+ (Vu): (Vv), (3.83) 
V-(uVv) =uV7v + (Vu): (Vv). (3.84) 
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Subtracting Eq. (3.84) from Eq. (3.83), integrating over a volume V on which u, v, and 
their derivatives are continuous, and applying Gauss’ theorem, Eq. (3.81), we obtain 


favre —vV*u)dt = uve —vVu)-do. (3.85) 
V aV 


This is Green’s theorem. An alternate form of Green’s theorem, obtained from Eq. (3.83) 
alone, is 


f uve: ds = [uv?vde+ [ ¥-Voar. (3.86) 
aV Vv V 


While the results already obtained are by far the most important forms of Gauss’ theo- 
rem, volume integrals involving the gradient or the curl may also appear. To derive these, 
we consider a vector of the form 


B(x, y,z) = BO, y, z)a, (3.87) 


in which a is a vector with constant magnitude and constant but arbitrary direction. Then 
Eq. (3.81) becomes, applying Eq. (3.72), 


af Bao = [ ¥-(Baydt=af vBar. 
av V V 


This may be rewritten 


a- § Bas — [ Vea =0. (3.88) 
av V 
Since the direction of a is arbitrary, Eq. (3.88) cannot always be satisfied unless the quan- 


tity in the square brackets evaluates to zero.° The result is 


§ Bao = [ vBar. (3.89) 


av Vv 


In a similar manner, using B = a x P in which a is a constant vector, we may show 


$ do x p=fv x Pdr. (3.90) 


av Vv 


These last two forms of Gauss’ theorem are used in the vector form of Kirchoff diffraction 
theory. 


6This exploitation of the arbitrary nature of a part of a problem is a valuable and widely used technique. 
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Stokes’ Theorem 


Stokes’ theorem is the analog of Gauss’ theorem that relates a surface integral of a deriva- 
tive of a function to the line integral of the function, with the path of integration being the 
perimeter bounding the surface. 

Let us take the surface and subdivide it into a network of arbitrarily small rectangles. 
In Eq. (3.60) we saw that the circulation of a vector B about such a differential rectan- 
gles (in the xy-plane) is V x Bl,é, dx dy. Identifying dx dy é, as the element of area do, 
Eq. (3.60) generalizes to 


\> B-dr=V x B-do. (3.91) 


four sides 


We now sum over all the little rectangles; the surface contributions, from the right-hand 
side of Eq. (3.91), are added together. The line integrals (left-hand side) of all interior 
line segments cancel identically. See Fig. 3.16. Only the line integral around the perimeter 
survives. Taking the limit as the number of rectangles approaches infinity, we have 


$B-ar= fv x B-do. (3.92) 


as S 


Here 0S is the perimeter of S. This is Stokes’ theorem. Note that both the sign of the 
line integral and the direction of do depend on the direction the perimeter is traversed, 
so consistent results will always be obtained. For the area and the line-integral direction 
shown in Fig. 3.16, the direction of o for the shaded rectangle will be out of the plane of 
the paper. 

Finally, consider what happens if we apply Stokes’ theorem to a closed surface. Since it 
has no perimeter, the line integral vanishes, so 


Jy x B-do=0, for S aclosed surface. (3.93) 
Ss 


As with Gauss’ theorem, we can derive additional relations connecting surface integrals 
with line integrals on their perimeter. Using the arbitrary-vector technique employed to 


FiGureE 3.16 Direction of normal for the shaded rectangle when perimeter of the surface 
is traversed as indicated. 
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reach Eqs. (3.89) and (3.90), we can obtain 


[eo x Vo = f odr, (3.94) 
S as 
[uoxvyxr=darxe. (3.95) 
S as 


Example 3.8.2 — O€RSTED’s AND FARADAY’s LAWS 


Consider the magnetic field generated by a long wire that carries a time-independent cur- 
rent J (meaning that dE/dt = 0B/dt = 0). The relevant Maxwell equation, Eq. (3.68), 
then takes the form V x B= jo J. Integrating this equation over a disk S perpendicular to 
and surrounding the wire (see Fig. 3.17), we have 


1 
r= [ s-do=— [ov xB)-do. 
Lo 
S 5 


Now we apply Stokes’ theorem, obtaining the result J = (1/j0) f, sB- dr, which is 
Oersted’s law. 

Similarly, we can integrate Maxwell’s equation for V x E, Eq. (3.69). Imagine moving 
a closed loop (0S) of wire (of area S) across a magnetic induction field B. We have 


d 
[ovxe-a0=- [Bao =— 
dt 
S S 


where ® is the magnetic flux through the area S. By Stokes’ theorem, we have 


d® 
peara=-Z. 
dt 


as 


d® 
dt’ 


This is Faraday’s law. The line integral represents the voltage induced in the wire loop; it is 
equal in magnitude to the rate of change of the magnetic flux through the loop. There is no 
sign ambiguity; if the direction of 0S is reversed, that causes a reversal of the direction of 
do and thereby of ®. |_| 


( 
V4 


FiGURE 3.17 Direction of B given by Oersted’s law. 








Exercises 


3.8.1 


3.8.2 


3.8.3 


3.8.4 


3.8.5 


3.8.6 
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Using Gauss’ theorem, prove that 


do =0 


a~ou, 


if S = 0V is aclosed surface. 
Show that 
1 
a § r-do=V, 
3 
S 


where V is the volume enclosed by the closed surface S = aV. 
Note. This is a generalization of Exercise 3.7.5. 


If B= V x A, show that 
$ B-do=0 


s 
for any closed surface S. 


From Eq. (3.72), with V the electric field E and f the electrostatic potential g, show 
that, for integration over all space, 


[ eede=eo f Bar. 


This corresponds to a 3-D integration by parts. 
Hint. E= —Vo, V -E= e/éo. You may assume that gy vanishes at large r at least as 
fast as r—!. 


A particular steady-state electric current distribution is localized in space. Choosing a 
bounding surface far enough out so that the current density J is zero everywhere on the 
surface, show that 

/ Jdt =0. 


Hint. Take one component of J at a time. With V - J = 0, show that Jj; = V - (x; J) and 
apply Gauss’ theorem. 


Given a vector t= —é,y + yx, show, with the help of Stokes’ theorem, that the integral 
of t around a continuous closed curve in the x y-plane satisfies 


1 1 
a Ptdr=s pirdy—ydx=A, 


where A is the area enclosed by the curve. 





170 Chapter 3 Vector Analysis 


3.8.7 The calculation of the magnetic moment of a current loop leads to the line integral 


fr xar. 


(a) Integrate around the perimeter of a current loop (in the xy-plane) and show that 
the scalar magnitude of this line integral is twice the area of the enclosed surface. 

(b) The perimeter of an ellipse is described by r = €,a cos 6 + @yb sin@. From part (a) 
show that the area of the ellipse is ab. 


3.8.8 Evaluate f r x dr by using the alternate form of Stokes’ theorem given by Eq. (3.95): 


[oxy xragarxe, 


S 


Take the loop to be entirely in the x y-plane. 


3.8.9 Prove that 
fuvo-dh=—f vYu-dd, 


fuvo-dr= [ww x (Vv)-do. 
Ss 


3.8.10 Prove that 


3.8.11 Prove that 


f do xP= | ¥ x Par. 
V 


aV 


3.8.12 Prove that 


[eo x Vox o ode, 
Ss as 
3.8.13 Prove that 
[iacxvxp= fare, 
S as 


3.9 POTENTIAL THEORY 


Much of physics, particularly electromagnetic theory, can be treated more simply by intro- 
ducing potentials from which forces can be derived. This section deals with the definition 
and use of such potentials. 
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Scalar Potential 


If, over a given simply connected region of space (one with no holes), a force can be 
expressed as the negative gradient of a scalar function ¢, 


F=-_Vo9, (3.96) 


we call y ascalar potential, and we benefit from the feature that the force can be described 
in terms of one function instead of three. Since the force is a derivative of the scalar poten- 
tial, the potential is only determined up to an additive constant, which can be used to adjust 
its value at infinity (usually zero) or at some other reference point. We want to know what 
conditions F must satisfy in order for a scalar potential to exist. 

First, consider the result of computing the work done against a force given by —V@ 
when an object subject to the force is moved from a point A to a point B. This is a line 
integral of the form 


B B 
- [ v-ar= [ vo-ar. (3.97) 
A A 
But, as pointed out in Eq. (3.41), Vy - dr = dg, so the integral is in fact independent of the 
path, depending only on the endpoints A and B. So we have 
B 


— | F-dr= (rn) - ota), (3.98) 
A 


which also means that if A and B are the same point, forming a closed loop, 


§ F-dr=0. (3.99) 


We conclude that a force (on an object) described by a scalar potential is a conservative 
force, meaning that the work needed to move the object between any two points is inde- 
pendent of the path taken, and that g(r) is the work needed to move to the point r from a 
reference point where the potential has been assigned the value zero. 

Another property of a force given by a scalar potential is that 


Vx F=-V x Vg=0 (3.100) 


as prescribed by Eq. (3.64). This observation is consistent with the notion that the lines of 
force of a conservative F cannot form closed loops. 

The three conditions, Eqs. (3.96), (3.99), and (3.100), are all equivalent. If we take 
Eq. (3.99) for a differential loop, its left side and that of Eq. (3.100) must, according 
to Stokes’ theorem, be equal. We already showed both these equations followed from 
Eq. (3.96). To complete the establishment of full equivalence, we need only to derive 
Eq. (3.96) from Eq. (3.99). Going backward to Eq. (3.97), we rewrite it as 


B 
[a+ ve-ar=o. 
A 
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which must be satisfied for all A and B. This means its integrand must be identically zero, 
thereby recovering Eq. (3.96). 


Example 3.9.1 ~~ GRAVITATIONAL POTENTIAL 


We have previously, in Example 3.5.2, illustrated the generation of a force from a scalar 
potential. To perform the reverse process, we must integrate. Let us find the scalar potential 
for the gravitational force 


F Gm mor kt 
G = = 
r2 r2? 





radially inward. Setting the zero of scalar potential at infinity, we obtain by integrating 
(radially) from infinity to position r, 


r (oe) 
va(r) — 9c(c0) == | Fo dr=+ | Fo ae 
CO r 


The minus sign in the central member of this equation arises because we are calculating 
the work done against the gravitational force. Evaluating the integral, 





CO 
kdr k Gm m2 
go(r) = ~=--= 
r r r 
: 
The final negative sign corresponds to the fact that gravity is an attractive force. a 


Vector Potential 


In some branches of physics, especially electrodynamics, it is convenient to introduce a 
vector potential A such that a (force) field B is given by 


B=V~xA. (3.101) 


An obvious reason for introducing A is that it causes B to be solenoidal; if B is the mag- 
netic induction field, this property is required by Maxwell’s equations. Here we want to 
develop a converse, namely to show that when B is solenoidal, a vector potential A exists. 
We demonstrate the existence of A by actually writing it. 

Our construction is 


x 


x y 
Axa, [ Bn y.2dx +8, [ .00.9.204y— f By y.20ds : (3.102) 


x0 yo x0 





3.9 Potential Theory 173 


Checking the y- and z-components of V x A first, noting that A, = 0, 


x 
0A, 7] 
(V x A)y =- = =4— | By(x, y, 2)dx = By, 
¢ ox ox : 
x0 
x 
dAy 0 
(V x A), = +— == — | B,(x, y,z) dx = B,. 
. Ox Ox 


x0 


The x-component of V x A is a bit more complicated. We have 





dA, dAy 
(V x A)x = =- - = 
dy Oz 
y. x x 
0 0 
=> By (xo, y,z)dy— | By(x,y,z)dx | —— | B(x, y, z) dx 
dy Oz 
Yo x0 x0 
x 
OBy(x, y, OBz(x, y, 
= By (x0, y, Z) II Sasa + (1) =| dx. 
dy Oz 


Xo 


To go further, we must use the fact that B is solenoidal, which means V - B= 0. We can 
therefore make the replacement 


dBy(x, y, Z) 0B,(x, y, Z) OBy(x, y, Z) 
+ = 
dy Oz Ox 





after which the x integration becomes trivial, yielding 


* OBx (x,y,z 
+f PEE? ae = B(x, y.2) — ByC40.942) 
Ox 


leading to the desired final result (V x A), = By. 

While we have shown that there exists a vector potential A such that V x A =B subject 
only to the condition that B be solenoidal, we have in no way established that A is unique. 
In fact, A is far from unique, as we can add to it not only an arbitrary constant, but also the 
gradient of any scalar function, Vg, without affecting B at all. Moreover, our verification 
of A was independent of the values of x9 and yo, so these can be assigned arbitrarily 
without affecting B. In addition, we can derive another formula for A in which the roles of 
x and y are interchanged: 


y x y 
A=-a f B(x. y.2dy ~&, [ ee.s0.248- f Br y.20dy : (3.103) 


yO x0 yO 
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Example 3.9.2 — MAGNETIC VECTOR POTENTIAL 


We consider the construction of the vector potential for a constant magnetic induction field 
B= B,é,. (3.104) 


Using Eq. (3.102), we have (choosing the arbitrary value of x9 to be zero) 


x 
Ana, [ Bedx = 8x8. (3.105) 
0 
Alternatively, we could use Eq. (3.103) for A, leading to 
A’ = —@, yB;. (3.106) 
Neither of these is the form for A found in many elementary texts, which for B from 
Eq. (3.104) is 
"” 1 Bz A A 
A =e) = eye) (3.107) 
These disparate forms can be reconciled if we use the freedom to add to A any expression 
of the form Vg. Taking gy = Cxy, the quantity that can be added to A will be of the form 
Vy =C(yey +.x8y). 
We now see that 
B, A A / B, A a n" 
A- > Vex + xy) =A’ + ey vex + xey) =A’, 


showing that all these formulas predict the same value of B. | 


Example a. 9.3 POTENTIALS IN ELECTROMAGNETISM 


If we introduce suitably defined scalar and vector potentials g and A into Maxwell’s 
equations, we can obtain equations giving these potentials in terms of the sources of the 
electromagnetic field (charges and currents). We start with B= V x A, thereby assuring 
satisfaction of the Maxwell’s equation V - B= 0. Substitution into the equation for V x E 
yields 


ot 


showing that E+ dA/dt is a gradient and can be written as —V¢q, thereby defining gy. This 
preserves the notion of an electrostatic potential in the absence of time dependence, and 
means that A and ¢ have now been defined to give 


JA 0A 
YES. — Vx{E+— ]=0, 


aA 
B=VxA, E=—Vy-—. (3.108) 


At this point A is still arbitrary to the extent of adding any gradient, which is equivalent to 
making an arbitrary choice of V - A. A convenient choice is to require 
1 dg 


+~—+V-A=0. 3.109 
c2 at + ( ) 
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This gauge condition is called the Lorentz gauge, and transformations of A and @ to 
satisfy it or any other legitimate gauge condition are called gauge transformations. The 
invariance of electromagnetic theory under gauge transformation is an important precursor 
of contemporary directions in fundamental physical theory. 
From Maxwell’s equation for V - E and the Lorentz gauge condition, we get 
0 vy poy y.q=_vy4 1% (3.110) 
&£ ~ at ~ oe ae” 
showing that the Lorentz gauge permitted us to decouple A and ¢ to the extent that we 
have an equation for ¢ in terms only of the charge density o; neither A nor the current 
density J enters this equation. 
Finally, from the equation for V x B, we obtain 


is V-A= Lod (3.111) 
c2 ar? rane’ 
Proof of this formula is the subject of Exercise 3.9.11. a 


Gauss’ Law 


Consider a point charge q at the origin of our coordinate system. It produces an electric 
field E, given by 


oe (3.112) 
~ Agregr2 : 
Gauss’ law states that for an arbitrary volume V, 
2. af dV encloses q, 
E-do =} £0 (3.113) 
0 if dV does not enclose q. 


av 


The case that 9V does not enclose q is easily handled. From Eq. (3.54), the r~? central 
force E is divergenceless everywhere except at r = 0, and for this case, throughout the 
entire volume V. Thus, we have, invoking Gauss’ theorem, Eq. (3.81), 


[v-B=0 —+ E-do=0. 
Vv 


If g is within the volume V, we must be more devious. We surround r = 0 by a small 
spherical hole (of radius 5), with a surface we designate S’, and connect the hole with the 
boundary of V via a small tube, thereby creating a simply connected region V’ to which 
Gauss’ theorem will apply. See Fig. 3.18. We now consider ¢ E - do on the surface of 
this modified volume. The contribution from the connecting tube will become negligible 
in the limit that it shrinks toward zero cross section, as E is finite everywhere on the 
tube’s surface. The integral over the modified 0 V will thus be that of the original dV (over 
the outer boundary, which we designate S), plus that of the inner spherical surface (S’). 
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FiGURE 3.18 Making a multiply connected region simply connected. 








But note that the “outward” direction for S’ is toward smaller r, so do’ = —rdA. Because 
the modified volume contains no charge, we have 
r-do’ 
$ E-do=$E-do+ es § 0) (3.114) 
Arey 62 
av’ S s! 


where we have inserted the explicit form of E in the S’ integral. Because S’ is a sphere of 
radius 6, this integral can be evaluated. Writing dQ as the element of solid angle, sodA = 


67dQ, 
er . 
g-2 =| a ream) =— f ae=—4n, 


Ss’ 





independent of the value of 6. Returning now to Eq. (3.114), it can be rearranged into 


fE-do ee ee oe 
Ar €9 £0 
S 


the result needed to confirm the second case of Gauss’ law, Eq. (3.113). 

Because the equations of electrostatics are linear, Gauss’ law can be extended to collec- 
tions of charges, or even to continuous charge distributions. In that case, g can be replaced 
by / y edt, and Gauss’ law becomes 


fecde= [Par (3.115) 
£0 
Vv 


av 
If we apply Gauss’ theorem to the left side of Eq. (3.115), we have 
[vara f Par. 
€0 
V Vv 


Since our volume is completely arbitrary, the integrands of this equation must be equal, so 


VR. (3.116) 
€0 


We thus see that Gauss’ law is the integral form of one of Maxwell’s equations. 
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Poisson’s Equation 


If we return to Eq. (3.116) and, assuming a situation independent of time, write E = —Vg, 
we obtain 
Vga=—f£. (3.117) 
€0 


This equation, applicable to electrostatics,’ is called Poisson’s equation. If, in addition, 
p =0, we have an even more famous equation, 


Vo =0, (3.118) 


Laplace’s equation. 

To make Poisson’s equation apply to a point charge g, we need to replace p by a con- 
centration of charge that is localized at a point and adds up to qg. The Dirac delta function 
is what we need for this purpose. Thus, for a point charge g at the origin, we write 


vga—4 6(r), (charge g atr = 0). (3.119) 
€0 


If we rewrite this equation, inserting the point-charge potential for g, we have 


i 
4 v?( \s 4 5m), 
Art €9 r E0 





which reduces to 


1 

Vv? (<) = —4r S(r). (3.120) 
r 

This equation circumvents the problem that the derivatives of 1/r do not exist at r = 0, 

and gives appropriate and correct results for systems containing point charges. Like the 

definition of the delta function itself, Eq. (3.120) is only meaningful when inserted into an 

integral. It is an important result that is used repeatedly in physics, often in the form 





1 
Vv; = —4q 8(r} — 12). (3.121) 
r12 


Here rj2 = |r| — r2|, and the subscript in V; indicates that the derivatives apply to r,. 


Helmholtz’s Theorem 


We now turn to two theorems that are of great formal importance, in that they establish 
conditions for the existence and uniqueness of solutions to time-independent problems in 
electromagnetic theory. The first of these theorems is: 


A vector field is uniquely specified by giving its divergence and its curl within a simply 
connected region and its normal component on the boundary. 





7For general time dependence, see Eq. (3.110). 
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Note that both for this theorem and the next (Helmholtz’s theorem), even if there are points 
in the simply connected region where the divergence or the curl is only defined in terms of 
delta functions, these points are not to be removed from the region. 

Let P be a vector field satisfying the conditions 


V-P=s, Vx P=c, (3.122) 


where s may be interpreted as a given source (charge) density and c as a given circulation 
(current) density. Assuming that the normal component P, on the boundary is also given, 
we want to show that P is unique. 

We proceed by assuming the existence of a second vector, P’, which satisfies Eq. (3.122) 
and has the same value of P,,. We form Q = P — P’, which must have V - Q, V x Q, and 
Q, all identically zero. Because Q is irrotational, there must exist a potential g such that 
Q=-—V¢, and because V - Q = 0, we also have 


V-o=0. 


Now we draw on Green’s theorem in the form given in Eq. (3.86), letting u and v each 
equal y. Because Q, = 0 on the boundary, Green’s theorem reduces to 


[ovo-cwordr= [ @-edr=o. 
V Vv 


This equation can only be satisfied if Q is identically zero, showing that P’ = P, thereby 
proving the theorem. 
The second theorem we shall prove, Helmholtz’s theorem, is 


A vector P with both source and circulation densities vanishing at infinity may be writ- 
ten as the sum of two parts, one of which is irrotational, the other of which is solenoidal. 


Helmholtz’s theorem will clearly be satisfied if P can be written in the form 
P=-Vo+VxA, (3.123) 


since —V¢@ is irrotational, while V x A is solenoidal. Because P is known, so are also s 
and c, defined as 


s=V-P, c=VxP. 


We proceed by exhibiting expressions for g and A that enable the recovery of s and c. 
Because the region here under study is simply connected and the vector involved vanishes 
at infinity (so that the first theorem of this subsection applies), having the correct s and c 
guarantees that we have properly reproduced P. 

The formulas proposed for g and A are the following, written in terms of the spatial 
variable rj: 








om=z [Pan (3.124) 

An r\2 

Acy= 7 f C82) a, (3.125) 
An 'r\2 


Here rj2 = |r, — ro]. 
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If Eq. (3.123) is to be satisfied with the proposed values of g and A, it is necessary that 
V-P=-V-Vo+V-(VxA)=—V*o=s, 
VxP=-VxV@4+Vx(VxA=Vx(VxA)=c. 


To check that —V*y = s, we examine 


2 1 2 1 
—Vre(r1) = ee Vi or S(I'2)dT2 


1 


= 7 f [- arin —¥9)]s(r2)dt2 = s(r1). (3.126) 


We have written V; to make clear that it operates on r; and not r2, and we have used the 
delta-function property given in Eq. (3.121). So s has been recovered. 

We now check that V x (V x A) =c. We start by using Eq. (3.70) to convert this 
condition to a more easily utilized form: 


Vx(VxA)=V(V-A)—V°A=ce. 


Taking r; as the free variable, we look first at 


1 
Vi (Vi -A(r)) = aul (=) dt 
1 1 
=—V, icy -Vi (—) dt 
4a Y1|2 
1 1 
=—V; [ow ‘ |-¥2 (—)| dt. 
4a r\2 


To reach the second line of this equation, we used Eq. (3.72) for the special case that the 
vector in that equation is not a function of the variable being differentiated. Then, to obtain 
the third line, we note that because the V, within the integral acts on a function of rj — rz, 
we can change V, into V2 and introduce a sign change. 

Now we integrate by parts, as in Example 3.7.3, reaching 


1 1 
Vi [Vi -A(r))] = ig! i (V2 . ¢(r2)) (—) dt. 


At last we have the result we need: V2 - e(r2) vanishes, because c is a curl, so the entire 
V(V - A) term is zero and may be dropped. This reduces the condition we are checking to 
-V’A=c. 

The quantity — VA is a vector Laplacian and we may individually evaluate its Cartesian 
components. For component /, 


2 1 2 1 
“WA =~ f ej(e2)Vr ( — J dee 


1 
= Saree i Cj (r2)[ —47d(r1 — r2)|dt2 =c (r1). 


This completes the proof of Helmholtz’s theorem. 
Helmholtz’s theorem legitimizes the division of the quantities appearing in electromag- 
netic theory into an irrotational vector field E and a solenoidal vector field B, together 
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with their respective representations using scalar and vector potentials. As we have seen 


in numerous examples, the source s is identified as the charge density (divided by ¢9) and 
the circulation ¢ is the current density (multiplied by ju). 


Exercises 


3.9.1 Ifa force F is given by 
F= (x7 + y? +27)"Gqx + byy + 622), 


find 
(a) V-F. 
(b) VxF. 


(c) A scalar potential g(x, y, z) so that F= —Vo. 
(d) For what value of the exponent n does the scalar potential diverge at both the 
origin and infinity? 


ANS. (a) (2n+3)r7" (b) 0 
(c) —r"t2/(2n4+2),n 4-1 (d) n=-1, p=—Inr. 
3.9.2 A sphere of radius a is uniformly charged (throughout its volume). Construct the elec- 


trostatic potential g(r) forO <r <oo. 


3.9.3 The origin of the Cartesian coordinates is at the Earth’s center. The moon is on the 
z-axis, a fixed distance R away (center-to-center distance). The tidal force exerted by 
the moon on a particle at the Earth’s surface (point x, y, z) is given by 


x y Z 
f= — Gms: SGM a Eo AO Mg 


Find the potential that yields this tidal force. 





GMm 1 1 
ANS. A 2 >). 
RB (< oP. Oe 


3.9.4 A long, straight wire carrying a current J produces a magnetic induction B with com- 


ponents 
bol y x 
B= : ,0}. 
2n ( xP? xP ty? ) 


Find a magnetic vector potential A. 





ANS. A=—2(191/4:1) In(x? + y7). (This solution is not unique.) 


r (< y “) 
pe re? 3? 37? 


find a vector A such that V x A=B. 


3.9.5 If 


er yz CyXZ 


ANS. One possible solution is A = Sey AGE 








3.9.6 


3.9.7 


3.9.8 


3.9.9 


3.9.10 


3.9.11 
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Show that the pair of equations 


1 
pea cue B=V x A, 


is satisfied by any constant magnetic induction B. 
Vector B is formed by the product of two gradients 
B= (Vu) x (Vv), 


where u and v are scalar functions. 


(a) Show that B is solenoidal. 
(b) Show that 


1 
A= ae vu = vVu) 
is a vector potential for B, in that 
B=VxA. 


The magnetic induction B is related to the magnetic vector potential A by B= V x A. 


By Stokes’ theorem 
[e-ao=$ acar. 


Show that each side of this equation is invariant under the gauge transformation, A > 
A+ VQ. 
Note. Take the function ¢ to be single-valued. 


Show that the value of the electrostatic potential g at any point P is equal to the average 
of the potential over any spherical surface centered on P, provided that there are no 
electric charges on or within the sphere. 

Hint. Use Green’s theorem, Eq. (3.85), with u =r~!, the distance from P, and v = 9. 
Equation (3.120) will also be useful. 


Using Maxwell’s equations, show that for a system (steady current) the magnetic vector 
potential A satisfies a vector Poisson equation, 


V-A=—uwJ, 


provided we require V- A=0. 


Derive, assuming the Lorentz gauge, Eq. (3.109): 


10A _, 


Hint. Eq. (3.70) will be helpful. 
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3.9.12 


3.10 


Prove that an arbitrary solenoidal vector B can be described as B= V x A, with 


x 


y y, 
A=-& f Bobs y.2)dy —& [ be.s0.2¢8- f Ber y.20dy 


yO x0 YO 


CURVILINEAR COORDINATES 


Up to this point we have treated vectors essentially entirely in Cartesian coordinates; when 
r or a function of it was encountered, we wrote r as \/x?2+ y2+4+ z?, so that Cartesian 
coordinates could continue to be used. Such an approach ignores the simplifications that 
can result if one uses a coordinate system that is appropriate to the symmetry of a problem. 
Central force problems are frequently easiest to deal with in spherical polar coordinates. 
Problems involving geometrical elements such as straight wires may be best handled in 
cylindrical coordinates. Yet other coordinate systems (of use too infrequent to be described 
here) may be appropriate for other problems. 

Naturally, there is a price that must be paid for the use of a non-Cartesian coordinate sys- 
tem. Vector operators become different in form, and their specific forms may be position- 
dependent. We proceed here to examine these questions and derive the necessary formulas. 


Orthogonal Coordinates in R* 


In Cartesian coordinates the point (xo, yo, zo) can be identified as the intersection of three 
planes: (1) the plane x = xo (a surface of constant x), (2) the plane y = yo (constant y), and 
(3) the plane z = zo (constant z). A change in x corresponds to a displacement normal to 
the surface of constant x; similar remarks apply to changes in y or z. The planes of constant 
coordinate value are mutually perpendicular, and have the obvious feature that the normal to 
any given one of them is in the same direction, no matter where on the plane it is constructed 
(a plane of constant x has a normal that is, of course, everywhere in the direction of é,). 
Consider now, as an example of a curvilinear coordinate system, spherical polar coor- 
dinates (see Fig. 3.19). A point r is identified by r (distance from the origin), 6 (angle of 
r relative to the polar axis, which is conventionally in the z direction), and gy (dihedral 
angle between the zx plane and the plane containing é, and r). The point r is therefore at 
the intersection of (1) a sphere of radius r, (2) a cone of opening angle 6, and (3) a half- 
plane through equatorial angle g. This example provides several observations: (1) general 








FiGURE 3.19 Spherical polar coordinates. 
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FiGURE 3.20 Effect of a “large” displacement in the direction ég. Note that r’ Ar. 


coordinates need not be lengths, (2) a surface of constant coordinate value may have a 
normal whose direction depends on position, (3) surfaces with different constant values of 
the same coordinate need not be parallel, and therefore also (4) changes in the value of a 
coordinate may move r in both an amount and a direction that depends on position. 

It is convenient to define unit vectors é,, ég, @, in the directions of the normals to the 
surfaces, respectively, of constant r, 6, and g. The spherical polar coordinate system has 
the feature that these unit vectors are mutually perpendicular, meaning that, for example, ég 
will be tangent to both the constant-r and constant-g surfaces, so that a small displacement 
in the @g direction will not change the values of either the r or the g coordinate. The 
reason for the restriction to “small” displacements is that the directions of the normals 
are position-dependent; a “large” displacement in the ég direction would change r (see 
Fig. 3.20). If the coordinate unit vectors are mutually perpendicular, the coordinate system 
is said to be orthogonal. 

If we have a vector field V (so we associate a value of V with each point in a region of 
R3 ), we can write V(r) in terms of the orthogonal set of unit vectors that are defined for 
the point r; symbolically, the result is 


V(r) = V, é, + Vo €g + Voy. 


It is important to realize that the unit vectors é; have directions that depend on the value 
of r. If we have another vector field W(r) for the same point r, we can perform algebraic 
processes® on V and W by the same rules as for Cartesian coordinates. For example, at the 
point r, 


V-W=V,W, + VoWe + Vp Wo. 


However, if V and W are not associated with the same r, we cannot carry out such opera- 
tions in this way, and it is important to realize that 


r£ rb, + O8y + Vey. 


Summarizing, the component formulas for V or W describe component decompositions 
applicable to the point at which the vector is specified; an attempt to decompose r as 
illustrated above is incorrect because it uses fixed unit-vector orientations where they do 
not apply. 

Dealing for the moment with an arbitrary curvilinear system, with coordinates labeled 
(41, 92,93), we consider how changes in the q; are related to changes in the Cartesian 
coordinates. Since x can be thought of as a function of the g;, namely x(q1, g2, g3), we have 


Ox Ox Ox 
dx = ~—dqi + ~—dqo + =— 4q3, (3.127) 
1 3 


with similar formulas for dy and dz. 





8 Addition, multiplication by a scalar, dot and cross products (but not application of differential or integral operators). 
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We next form a measure of the differential displacement, dr, associated with changes 
dq;. We actually examine 


(dr)? = (dx)? + (dy)? + (dz). 
Taking the square of Eq. (3.127), we get 
Ox Ox 
(dx) =) > — —dqidq; 
2 agi aq; 
and similar expressions for (dy)* and (dz)?. Combining these and collecting terms with 
the same dq; dq;, we reach the result 


lary = Be dqi qj; (3.128) 
ij 

where 
ax Ox = dy oy dz OZ 
Ogi 9; 9qi 9G; + — Aq: Aq; 
Spaces with a measure of distance given by Eq. (3.128) are called metric or Riemannian. 

Equation (3.129) can be interpreted as the dot product of a vector in the dq; direction, of 
components (dx/dg;, dy/dgi, 0z/dq;), with a similar vector in the dq; direction. If the 
qi coordinates are perpendicular, the coefficients g;; will vanish when i ¢ j. 


Since it is our objective to discuss orthogonal coordinate systems, we specialize 
Egs. (3.128) and (3.129) to 


(dr)* = (hy dqi)” + (hz dqz)” + (h3.dq3)”, (3.130) 


ax \? dy ‘ dy 2 
h? = ( — +(#) +(2) . 3.131 
; (5) agi Ogi ( ) 


If we consider Eq. (3.130) for a case dgz = dq3 = 0, we see that we can identify hidq 
as dr;, meaning that the element of displacement in the gq; direction is h,dq ,. Thus, in 
general, 





8ij (91, 92,93) = (3.129) 


r) z 
Mende, oF “sie. (3.132) 
Ogi 
Here é; is a unit vector in the g; direction, and the overall dr takes the form 
dr = hydq, & + hodqz €2 + h3dq3e. (3.133) 


Note that 4; may be position-dependent and must have the dimension needed to cause 
h;dq; to be a length. 


Integrals in Curvilinear Coordinates 


Given the scale factors h; for a set of coordinates, either because they have been tabulated 
or because we have evaluated them via Eq. (3.131), we can use them to set up formulas for 
integration in the curvilinear coordinates. Line integrals will take the form 


[va=D] Vihidqi. (3.134) 
Cc i ¢ 
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Surface integrals take the same form as in Cartesian coordinates, with the exception that 
instead of expressions like dx dy we have (hdq1)(h2dq2) = h,h2 dq, dq? etc. This means 
that 


[v-do= | Vitatsdandas+ | Vahah dada + [ Vaiaindandan. (3.135) 
S S S Ss 


The element of volume in orthogonal curvilinear coordinates is 
dt =hyhoh3 dqidq2dq3, (3.136) 


so volume integrals take the form 


[ 641. 42.4)ihahadqrdandas, (3.137) 
V 


or the analogous expression with g replaced by a vector V(q1, q2, q3). 


Differential Operators in Curvilinear Coordinates 

We continue with a restriction to orthogonal coordinate systems. 

Gradient—Because our curvilinear coordinates are orthogonal, the gradient takes the 
same form as for Cartesian coordinates, providing we use the differential displacements 


dr; = h; dq; in the formula. Thus, we have 


. 1 dp . 1 dg, 1 dg 
Vo (qi, 92,93) =e1 + é@o 3 ; 
eee’ hy 0q1 hz 0q2 h3 0q3 





(3.138) 


this corresponds to writing V as 


lo, 10 ., 1 0 


V=é, +e + e3 : 
hy oq h2 0q2 h3 0q3 


(3.139) 





Divergence—This operator must have the same meaning as in Cartesian coordinates, 
so V - V must give the net outward flux of V per unit volume at the point of evaluation. 
The key difference from the Cartesian case is that an element of volume will no longer be 
a parallelepiped, as the scale factors h; are in general functions of position. See Fig. 3.21. 
To compute the net outflow of V in the qg; direction from a volume element defined by 


hodqo 


-Byhydgoh3d9lq,—aq./2 San +Byhgdqzh3dqalq,+.aq,/2 
es 


h3dq3 





FIGURE 3.21 Outflow of B, in the q; direction from a curvilinear volume element. 
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dq, dq2, dq3 and centered at (q1, g2, 93), we must form 


Net g; outflow = —Vih2h3 dq2d + Vj h2h3 dqrdq3 3 
a i as qi—4q1/2,92,93 wee qitdq/2,92.93 


(3.140) 
Note that not only V;, but also h2h3 must be evaluated at the displaced values of q1; this 


product may have different values at g) + dqi/2 and gq; — dq /2. Rewriting Eq. (3.140) in 
terms of a derivative with respect to qj, we have 


a 
Net gq; outflow = Bq, © taka ldaidgadas. 
71 


Combining this with the gz and q3 outflows and dividing by the differential volume 
hy hzh3 dqidq2dq3, we get the formula 


0 0 0 
V -V(q1, 92,93) = agi ahs) + aga ey) + sos (Vait | (3.141) 


1 
hyhzh3 


Laplacian—From the formulas for the gradient and divergence, we can form the Laplacian 
in curvilinear coordinates: 


V’9(q1,92,3) =V-Vo= 
1 0 hoh3 0 0 h3h, oO 0 hyh2 0 
(= a (= 4 c 2). (3.142) 
hyhoh3 Logi \ hy Og) 0q2 \ ho 0q2 093 \ h3 0q3 


Note that the Laplacian contains no cross derivatives, such as 87/q10q2. They do not 
appear because the coordinate system is orthogonal. 





Curl—In the same spirit as our treatment of the divergence, we calculate the circulation 
around an element of area in the q1q2 plane, and therefore associated with a vector in 
the g3 direction. Referring to Fig. 3.22, the line integral ¢ B - dr consists of four segment 


Bohgdqalq,+dqy/2 


-Byhed9a| q,-dq,/2 


+By hyd q.-dqz!2 





FIGURE 3.22 Circulation ¢ B - dr around curvilinear element of area on 
a surface of constant q3. 
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contributions, which to first order are 


Segment 1 = (h; B)) 





dq, 
41,.92—4q2/2,93 


Segment 2 = (h2 Bo) 





dq, 
qitdqi/2,q2.93 


Segment 3 = —(h1 B)) dq, 


q1,42+dq2/2.93 





Segment 4 = —(h2B2) 





dq2z 
gi—dqi/2,92.93 
Keeping in mind that the A; are functions of position, and that the loop has area 
h,hz dq,dqz, these contributions combine into a circulation per unit area 


1 0 a 
(V x B)3 = —_ [= gaotms + (haba) | 
> Aha | aq aq1 
The generalization of this result to arbitrary orientation of the circulation loop can be 

brought to the determinantal form 
Qh; enh2  @3h3 
a a a 
= hyhgh3 aq dq2 0q3 |" 
hi By h2Bz = h3B3 





VxB (3.143) 


Just as for Cartesian coordinates, this determinant is to be evaluated from the top down, so 
that the derivatives will act on its bottom row. 


Circular Cylindrical Coordinates 


Although there are at least 11 coordinate systems that are appropriate for use in solving 
physics problems, the evolution of computers and efficient programming techniques have 
greatly reduced the need for most of these coordinate systems, with the result that the dis- 
cussion in this book is limited to (1) Cartesian coordinates, (2) spherical polar coordinates 
(treated in the next subsection), and (3) circular cylindrical coordinates, which we discuss 
here. Specifications and details of other coordinate systems will be found in the first two 
editions of this work and in Additional Readings at the end of this chapter (Morse and 
Feshbach, Margenau and Murphy). 

In the circular cylindrical coordinate system the three curvilinear coordinates are labeled 
(p, , z). We use p for the perpendicular distance from the z-axis because we reserve r for 
the distance from the origin. The ranges of p, g, and z are 


O<p<mw, O0<gy<2m7, -~<z7<o@™. 


For p = 0, ¢ is not well defined. The coordinate surfaces, shown in Fig. 3.23, follow: 
1. Right circular cylinders having the z-axis as a common axis, 


1/2 
p= ie + y”) = constant. 
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FiGuRE 3.23 Cylindrical coordinates p, ¢, z. 


2. Half-planes through the z-axis, at an angle y measured from the x direction, 
g= tan! (=) = constant. 
x 


The arctangent is double valued on the range of g, and the correct value of y must be 
determined by the individual signs of x and y. 
3. Planes parallel to the xy-plane, as in the Cartesian system, 


Zz = constant. 
Inverting the preceding equations, we can obtain 
x=pcosg, y=psing, z=Z. (3.144) 


This is essentially a 2-D curvilinear system with a Cartesian z-axis added on to form 
a 3-D system. 


The coordinate vector r and a general vector V are expressed as 
Y= Pep+zb,, V=Vp€p + Vo byt Vz. 
From Eq. (3.131), the scale factors for these coordinates are 


hp=1, hg=p, hz=1, (3.145) 
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so the elements of displacement, area, and volume are 


dr =€,dp + peydg + €, dz, 
do = pe, dydz+ @gdpdz+ pe, dpdg, (3.146) 
dt = pdpdgdz. 


It is perhaps worth emphasizing that the unit vectors é, and @, have directions that vary 
with @; if expressions containing these unit vectors are differentiated with respect to g, the 
derivatives of these unit vectors must be included in the computations. 


Example 3.10.1 KEPLER’S AREA LAW FOR PLANETARY MOTION 


One of Kepler’s laws states that the radius vector of a planet, relative to an origin at the 
sun, sweeps out equal areas in equal time. It is instructive to derive this relationship using 
cylindrical coordinates. For simplicity we consider a planet of unit mass and motion in the 
plane z = 0. 

The gravitational force F is of the form f(r) é,, and hence the torque about the origin, 
r x F, vanishes, so angular momentum L = r x dr/dt is conserved. To evaluate dr/dt, 
we start from dr as given in Eq. (3.146), writing 


dr 


a a ep 7 ey pe, 


where we have used the dot notation (invented by Newton) to indicate time derivatives. 
We now form 


L= péy x (8 / + by pg) = pe. 
We conclude that p* @ is constant. Making the identification p* ¢ = 2dA/dt, where A is 
the area swept out, we confirm Kepler’s law. a 


Continuing now to the vector differential operators, using Eqs. (3.138), (3.141), (3.142), 
and (3.143), we have 


ow. law. a 
v ae ee (3.147) 











Vv Lie al = e 
¥(P, 0.2) *? Bhp - ? oD ap “az 
1a 1 OV, av, 
V-V=-—(ev,)+——* eS 3.148 
revi p) p do az ( ) 
1a (a 1 a of 
Vy= p v vy ve (3.149) 
pdp \ dp p? dg2— az? 
Cp Ply &, 
VxVv Bs oe (3.150) 
xV=-/— — —|}. : 
p|o0p dp oz 
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Finally, for problems such as circular wave guides and cylindrical cavity resonators, one 
needs the vector Laplacian V7V. From Eq. (3.70), its components in cylindrical coordi- 
nates can be shown to be 








1 2 av, 
vv 2974,- = 
p p p~ ap 
1 2 av, 
VV| =V?°V%p—- Vt 54, (3.151) 
Y p p- og 
VeVi =V’%-. 
Zz 





Example 3.10.2 = A NAviER-STOKES TERM 


The Navier-Stokes equations of hydrodynamics contain a nonlinear term 
V x [vx (V x v)], 
where v is the fluid velocity. For fluid flowing through a cylindrical pipe in the z direction, 
vV=6é,v(p). 


From Eq. (3.150), 








Qn peo &, 
1] 0 a i) a 
Vv xV= = ey 7 ’ 
p\op op az 0p 
0 O- v(p) 
Eo Gp & 
z a 
vx(V xv= O " . = 6, v(p) —. 
dv dp 
—-— 0 
dp 
Finally, 
@, peg &, 
i). es 
Vx(vx(Vxv))=—-| a0 dp dz|=0. 
a 
v— 0 0 
dp 
For this particular case, the nonlinear term vanishes. a 


Spherical Polar Coordinates 


Spherical polar coordinates were introduced as an initial example of a curvilinear coordi- 
nate system, and were illustrated in Fig. 3.19. We reiterate: The coordinates are labeled 
(r, 6, y). Their ranges are 


O<r<o, O<0<z, 0<@<2z. 
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For r = 0, neither 9 nor g is well defined. Additionally, g is ill-defined for 6 = 0 and 
6 = 7. The coordinate surfaces follow: 


1. Concentric spheres centered at the origin, 
1/2 
r= (x? + ye =F 2) = constant. 

2. Right circular cones centered on the z (polar) axis with vertices at the origin, 

Zz 

6 = arccos — = constant. 

r 

3. Half-planes through the z (polar) axis, at an angle g measured from the x direction, 


g = arctan * = constant. 
x 


The arctangent is double valued on the range of gy, and the correct value of g must be 
determined by the individual signs of x and y. 
Inverting the preceding equations, we can obtain 


x=rsinOcosg, y=rsinésing, z=rcosé. (3.152) 
The coordinate vector r and a general vector V are expressed as 
r=reé,, V=V,é.+ Voeg+ Vy ey. 
From Eq. (3.131), the scale factors for these coordinates are 
h-=1, he=r, hg=r sind, (3.153) 
so the elements of displacement, area, and volume are 
dr =@,dr+régd0 +r sin@é, dg, 
do =r*sin0é,d0dgy +r sind é drdy +r é, dr dé, (3.154) 
dt =r’ sin dp dé dg. 


Frequently one encounters a need to perform a surface integration over the angles, in which 
case the angular dependence of do reduces to 


dQ = sind dé dg, (3.155) 


where dQ is called an element of solid angle, and has the property that its integral over all 
angles has the value 
/ dQ = 4x. 


Note that for spherical polar coordinates, all three of the unit vectors have directions that 
depend on position, and this fact must be taken into account when expressions containing 
the unit vectors are differentiated. 
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The vector differential operators may now be evaluated, using Eqs. (3.138), (3.141), 
(3.142), and (3.143): 


ow . low. 1 ow 

















Vw(r,6,9) =6 3.156 
MO) Ee NS ag Oca ag ore 
1 oi Oa a. aVy 
V-V= oa [sing ag Ve) rg, (ene var], (3.157) 
1 a: aw a. dw 1 ey 
Vw= 6 s 6 , (3.158 
Une sin ar (- ar ) + 36 (sin 00 ) Te Gort 
é, rég rsin@é, 
1 |a a a 
i. (3.159) 


r2 sing |or 00 dg 
V, rVo rsin@Vy 


Finally, again using Eq. (3.70), the components of the vector Laplacian V’V in spherical 
polar coordinates can be shown to be 














2 2 2 aV 2; av, 
V-v| =V°V,——V,—— cotOVa i : ae 
: re) re r2 00 ~=r*sin@ 09 
vv) =v2v 1 V, 2 OV, 2cos@ dVy (3.160) 
a r2 sin’ 6 OT 280 2sin26 dg” } 
1 2 dV, 2cosd OV, 
vv| =v2v, ee + = g 
¥ r2 sin? 6 r-sind dp r2 sin“ 6 OP 


Example 3.10.3 V, V-, Vx FORA CENTRAL FORCE 


We can now easily derive some of the results previously obtained more laboriously in 
Cartesian coordinates: 
From Eq. (3.156), 


d 
Vi(n= aS, Vr" =6,nr" |. (3.161) 
r 
Specializing to the Coulomb potential of a point charge at the origin, V = Ze/(47reor), so 


the electric field has the expected value E = —VV = (Ze/4sre0r7)é;. 
Taking next the divergence of a radial function, we have from Eq. (3.157), 


Z 2 d n = 
V -(é, royecro+, V-(@,-r")=(n+2)r"1. (3.162) 
Specializing the above to the Coulomb force (n = —2), we have (except for r = 0) 


V -r~? =0, which is consistent with Gauss’ law. 
Continuing now to the Laplacian, from Eq. (3.158) we have 


Z 
VIO =— 7+ aay Vir antes Ir, (3.163) 


in contrast to the ordinary second derivative of r” involving n — 1. 





3.10 Curvilinear Coordinates 193 
Finally, from Eq. (3.159), 
vx (é, f(r) =0, (3.164) 


which confirms that central forces are irrotational. |_| 


Example 3.10.4 — MAGNETIC VECTOR POTENTIAL 


A single current loop in the xy-plane has a vector potential A that is a function only of r 
and @, is entirely in the €, direction and is related to the current density J by the equation 


jig) = V x B=V x [V x &pAg(r, 0)]. 


In spherical polar coordinates this reduces to 











é, reg r sind €y 
J=V 1 a a 0 
= x —— 
me r2sin6 | dr 00 YD 
0 0 rsin6Ag 
Vx : é Deana é 2 Ganda 
= saa ear —reg—(r ; 
r2sind |” a6 0 ar ? 
Taking the curl a second time, we obtain 
é, reg r sin €y 
1 aa oe 
Lod = and or 00 0p 
: (sin@ Ag) ace (rAg) 0 
—(sin --— 
rsind 30 oF or? 
Expanding this determinant from the top down, we reach 
—— a7 Ay 42949, 1 9 (Ag Le (3.165) 
Hon Ol a2 r Or | r2sindd0\” 00) 72sinto ? | 
Note that we get, in addition to V4. one more term: —Ag/ r? sin? 0. a 


Example 3.10.5 = Stokes’ THEOREM 


As a final example, let’s compute ¢ B- dr for a closed loop, comparing the result with 
integrals [(V x B) - do for two different surfaces having the same perimeter. We use 
spherical polar coordinates, taking B= e~"€y. 

The loop will be a unit circle about the origin in the xy-plane; the line integral about 
it will be taken in a counterclockwise sense as viewed from positive z, so the normal to 
the surfaces it bounds will pass through the xy-plane in the direction of positive z. The 
surfaces we consider are (1) a circular disk bounded by the loop, and (3) a hemisphere 
bounded by the loop, with its surface in the region z < 0. See Fig. 3.24. 
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FiGuRE 3.24 Surfaces for Example 3.10.5: (left) S;, disk; (right) Sz, hemisphere. 


For the line integral, dr = r sin 6 €, dg, which reduces to dr = €y dg since 6 = 1/2 and 
r = 1 on the entire loop. We then have 


For the surface integrals, we need V x B: 





VxB = ind may: O( inde ")é 
= —(r e —r—(rsinde")é 
r2sind | 30 tO or e 
=F cai 
=< ae é.—(l1—r)e" é. 
r sind 


Taking first the disk, at all points of which 6 = 1/2, with integration range 0 <r < 1, 
and 0 < @ < 27, we note that do = —égr sin@ dr dy = —€gr dr dg. The minus sign arises 
because the positive normal is in the direction of decreasing 0. Then, 


20 1 


' = 20 
[-0 xB) -erarap= [do f ara—ne r=, 
e 
0 0 


Sj 


For the hemisphere, defined by r = 1, 7/2 < 0 < z, and 0 < » < 27, we have do = 
—é, r? sin do dy = —é, sin d@ dg (the normal is in the direction of decreasing r), and 


sf 20 
2. 
[-© x B) - 6, sinaadap=— [ doe'cose f dp= = 
e 
S2 m/2 0 


The results for both surfaces agree with that from the line integral of their common 
perimeter. Because V x B is solenoidal, all the flux that passes through the disk in the 
xy-plane must continue through the hemispherical surface, and for that matter, through 
any surface with the same perimeter. That is why Stokes’ theorem is indifferent to features 
of the surface other than its perimeter. a 
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Rotation and Reflection in Spherical Coordinates 


It is infrequent that rotational coordinate transformations need be applied in curvilinear 
coordinate systems, and they usually arise only in contexts that are compatible with the 
symmetry of the coordinate system. We limit the current discussion to rotations (and 
reflections) in spherical polar coordinates. 


Rotation—Suppose a coordinate rotation identified by Euler angles (a, 8, y) converts the 
coordinates of a point from (r, 0, @) to (7, 6’, g’). It is obvious that r retains its original 
value. Two questions arise: (1) How are 0’ and g’ related to 6 and gy? and (2) How do the 
components of a vector A, namely (A,, Ag, Ay), transform? 

It is simplest to proceed, as we did for Cartesian coordinates, by analyzing the three 
consecutive rotations implied by the Euler angles. The first rotation, by an angle a about 
the z-axis, leaves 6 unchanged, and converts g into g — a. However, it causes no change 
in any of the components of A. 

The second rotation, which inclines the polar direction by an angle 6 toward the (new) 
x-axis, does change the values of both 6 and g and, in addition, changes the directions 
of @ and éy. Referring to Fig. 3.25, we see that these two unit vectors are subjected to 
a rotation x in the plane tangent to the sphere of constant r, thereby yielding new unit 
vectors @, and é/, such that 


€g =cos x & —sinxe,, €, =sin x & + cos x €,,. 
This transformation corresponds to 
( cosx sin *) 
So = ; F 
—sinx cosx 
Carrying out the spherical trigonometry corresponding to Fig. 3.25, we have the new 
coordinates 


cos B cos 0’ — cos@ 


cos6’ = cosB cos@ + sinB sin@ cos(g—a), cosg’ = , (3.166) 





sin f sin 6’ 





FiGURE 3.25 Rotation and unit vectors in spherical polar coordinates, shown on a sphere 
of radius r. The original polar direction is marked z; it is moved to the direction z’, at an 
inclination given by the Euler angle 6. The unit vectors ég and é, at the point P are 
thereby rotated through the angle x. 
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and 


cos B — cos @ cos 6’ 





cos X = (3.167) 


sin 6 sin 6’ 
The third rotation, by an angle y about the new z-axis, leaves the components of A 
unchanged but requires the replacement of g’ by g’ — y. 
Summarizing, 


Al. i 0 0 Ay 
Ag =1|0 COS x sin x Ae |. (3.168) 
Ay QO -—sinx cosx Ag 


This equation specifies the components of A in the rotated coordinates at the point 
(r, 0’, y' — y) in terms of the original components at the same physical point, (r, 0, ¢). 


Reflection—Inversion of the coordinate system reverses the sign of each Cartesian coor- 
dinate. Taking the angle y as that which moves the new + coordinate toward the new +y 
coordinate, the system (which was originally right-handed) now becomes left-handed. The 
coordinates (r, 0, y) of a (fixed) point become, in the new system, (r, 7 — 0,27 + g). The 
unit vectors @, and @, are invariant under inversion, but ég changes sign, so 


Al A, 

Al, | =| —Ao 
/ 

Ay ‘My 


, coordinate inversion. (3.169) 


Exercises 


3.10.1 


3.10.2 


The u-, v-, z-coordinate system frequently used in electrostatics and in hydrodynamics 
is defined by 


xy=U, eae SUP. 2S 2. 
This u-, v-, z-system is orthogonal. 


(a) In words, describe briefly the nature of each of the three families of coordinate 
surfaces. 

(b) Sketch the system in the x y-plane showing the intersections of surfaces of constant 
u and surfaces of constant v with the xy-plane. 

(c) Indicate the directions of the unit vectors é, and é, in all four quadrants. 

(d) Finally, is this u-, v-, z-system right-handed (é, x €, = +€,) or left-handed (é, x 
é, =—€,)? 


The elliptic cylindrical coordinate system consists of three families of surfaces: 


(ate, IO ei. CG) 
=1; =1; 2%. 
a2cosh?u a? sinh* u a*cos*v a2 sin? v 








Sketch the coordinate surfaces u = constant and v = constant as they intersect the first 
quadrant of the x y-plane. Show the unit vectors é,, and é,. The range of u is 0 <u <0. 
The range of v isO<v < 27. 





3.10.3 


3.10.4 


3.10.5 


3.10.6 


3.10.7 


3.10.8 
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Develop arguments to show that dot and cross products (not involving V) in orthogonal 
curvilinear coordinates in R? proceed, as in Cartesian coordinates, with no involvement 
of scale factors. 


With é; a unit vector in the direction of increasing g,, show that 











mn 1 O(hoh 
@ V-a= (h2h3) 
hyhzh3 Oq4 
Zz 1/7. 10h, ., 1 Oh 
b) Vx ey = E e3 | 
( hy | “hg 0q3 h 0q2 


Note that even though é; is a unit vector, its divergence and curl do not necessarily 
vanish. 


Show that a set of orthogonal unit vectors é; may be defined by 
a 1 or 
ej; = : 
hi agi 


In particular, show that é; - é; = 1 leads to an expression for h; in agreement with 
Eq. (3.131). 
The above equation for é; may be taken as a starting point for deriving 











ae 1 0h; 
ig gj 
Oqj 1h; Ogi 
and 
ae 
oe “Th hj 7 


J#i 


Resolve the circular cylindrical unit vectors into their Cartesian components (see 
Fig. 3.23). 


ANS. €p = cosp + & sing, 
€y = —€, sing + @ cosy, 
é, = &. 
Resolve the Cartesian unit vectors into their circular cylindrical components (see 
Fig. 3.23). 


ANS. cosy — ég sing, 


é: = é, 
é, = é, sing + é, cosg, 
é, = €. 
From the results of Exercise 3.10.6, show that 
dep dep 
a, ~ p 


dg dg 


and that all other first derivatives of the circular cylindrical unit vectors with respect to 
the circular cylindrical coordinates vanish. 
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3.10.9 


3.10.10 


3.10.11 


3.10.12 


3.10.13 


3.10.14 


Compare V - V as given for cylindrical coordinates in Eq. (3.148) with the result of its 
computation by applying to V the operator 
a0. ,10 0 


V=e +é +, 
° a0 ° pa “Oz 





Note that V acts both on the unit vectors and on the components of V. 

(a) Show that r=é,p + €,z. 

(b) Working entirely in circular cylindrical coordinates, show that 
V-r=3 and Vxr=0. 


(a) Show that the parity operation (reflection through the origin) on a point (p, ¢, z) 
relative to fixed x-, y-, z-axes consists of the transformation 


p>p, @>7GEN, ~>-z. 


(b) Show that é, and €, have odd parity (reversal of direction) and that é, has even 


parity. 
Note. The Cartesian unit vectors é,, é), and €, remain constant. 


A rigid body is rotating about a fixed axis with a constant angular velocity @. Take w 
to lie along the z-axis. Express the position vector r in circular cylindrical coordinates 
and using circular cylindrical coordinates, 


(a) calculate v=o xr, 
(b) calculate V x v. 


ANS. (a) vV=€ywp 
(b) Vxv=2e. 


Find the circular cylindrical components of the velocity and acceleration of a moving 
particle, 


U=p, ag=p— pe’; 

Vg =PY, Ag=pP+2p9g, 

Ve=2Z, a, =%. 
Hint. r(t) = €,(t) p(t) + €,z(t) 

= [€, cos y(t) + ey sing(t)] p(t) + €,z(t). 
Note. p = dp/dt, fp = d*p/dt?, and so on. 
In right circular cylindrical coordinates, a particular vector function is given by 
V(P, 9) = €pVp(p, 9) + €y Vo(p, ¢). 


Show that V x V has only a z-component. Note that this result will hold for any vector 
confined to a surface q3 = constant as long as the products h;V,; and h2V>2 are each 
independent of q3. 





3.10.15 


3.10.16 


3.10.17 


3.10.18 


3.10.19 


3.10.20 


3.10.21 


3.10.22 
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A conducting wire along the z-axis carries a current 7. The resulting magnetic vector 


potential is given by 
I 1 
A=t in (<) 
20 p 
Show that the magnetic induction B is given by 
" I 
B=€, tainly 
2p 
A force is described by 


xX 


wi y i 
F=-e +e, : 
x2 y2 V2 y2 





(a) Express F in circular cylindrical coordinates. 
Operating entirely in circular cylindrical coordinates for (b) and (c), 
(b) Calculate the curl of F and 
(c) Calculate the work done by F in encircling the unit circle once counter-clockwise. 
(d) How do you reconcile the results of (b) and (c)? 


A calculation of the magnetohydrodynamic pinch effect involves the evaluation of 
(B- V)B. If the magnetic induction B is taken to be B= €, By(p), show that 

(B- V)B=—6,B?/p. 
Express the spherical polar unit vectors in terms of Cartesian unit vectors. 


ANS. 6, = é, sin@ cosg + @, sin@ sing + €, cosd, 
€o = €, cos 6 cos @ + éy cos 6 sing — é, sind, 
€y = —€; SING + ey COS G. 
Resolve the Cartesian unit vectors into their spherical polar components: 


é; = é, sind cosy + &g cos 6 cosy — €y sing, 


om 
< 
ll 


é, sind sing + €9 cosO sing + €g cosy, 
é, = é, cosé — ég sing. 


(a) Explain why it is not possible to relate a column vector r (with components x, 
y, z) to another column vector r’ (with components r, 0, y), via a matrix equation 
of the form r’ = Br. 

(b) One can write a matrix equation relating the Cartesian components of a vector to 
its components in spherical polar coordinates. Find the transformation matrix and 
determine whether it is orthogonal. 


Find the transformation matrix that converts the components of a vector in spherical 
polar coordinates into its components in circular cylindrical coordinates. Then find the 
matrix of the inverse transformation. 


(a) From the results of Exercise 3.10.18, calculate the partial derivatives of é,, é9, and 
é, with respect to r, 6, and g. 
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(b) With V given by 
8 0 48 la 48 1 0 
e 
"or "rae °rsind dp 





a) 


(greatest space rate of change), use the results of part (a) to calculate V - Vy. This 
is an alternate derivation of the Laplacian. 


Note. The derivatives of the left-hand V operate on the unit vectors of the right-hand V 
before the dot product is evaluated. 


3.10.23 A rigid body is rotating about a fixed axis with a constant angular velocity w. Take w to 
be along the z-axis. Using spherical polar coordinates, 


(a) calculate v=o xr. 
(b) calculate V x v. 


ANS. (a) v= gor sind. 
(b) Vxv=2e. 


3.10.24 A certain vector V has no radial component. Its curl has no tangential components. 
What does this imply about the radial dependence of the tangential components of V? 


3.10.25 Modern physics lays great stress on the property of parity (whether a quantity remains 
invariant or changes sign under an inversion of the coordinate system). In Cartesian 
coordinates this means x > —x, y > —y, and z > —z. 


(a) Show that the inversion (reflection through the origin) of a point (r, 0, g) relative 
to fixed x-, y-, z-axes consists of the transformation 





ror 077-0, 9>GHET. 
(b) Show that é, and é, have odd parity (reversal of direction) and that ég has even 
parity. 
3.10.26 With A any vector, 
A-Vr=A. 


(a) Verify this result in Cartesian coordinates. 
(b) Verify this result using spherical polar coordinates. Equation (3.156) provides V. 


3.10.27 Find the spherical coordinate components of the velocity and acceleration of a moving 
particle: 


vr =f, ay =¥ —r6* —rsin’ 69°, 
vg =r, ag =r +276 —rsinO cos6¢g’, 


Vg =rsindg, dyg=rsinO@ + 2r sinOg + 2r cos O6@. 





3.10.28 


3.10.29 


3.10.30 


3.10.31 
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Hint. r(t) =@,(t)r(t) 
= [€, sinO(t) cos g(t) + é, sin O(t) sin g(t) + €, cos @(t)]r(t). 


Note. The dot in 7, 6, g means time derivative: r = dr/dt, 6= dé /dt, 
g=do/dt. 


Express 0/dx, 0/dy, 0/0z in spherical polar coordinates. 








R) 0 1d sing 0d 
A _ —=sSi 6 aoe 0 s = oe aA? 
NS. Dx sin ce ae + cos i 30 rsind dg 

0 ze) r) 

— = sind sing— + cos@ sing—— + = ame 

dy or rod rsin@ dg 

0 0 . la 

— =cosé— — sind-—. 

0z or r 00 


Hint. Equate V,yz and V;og. 


Using results from Exercise 3.10.28, show that 


.(.9 a ) a) 
i{x y ==1—, 
dy ox dp 


This is the quantum mechanical operator corresponding to the z-component of orbital 
angular momentum. 





With the quantum mechanical orbital angular momentum operator defined as L = 
—i(r x V), show that 


; 0 
el? 2 ue gepig : 
00 GIO) 


inf O32... B 
(b) L,y—-iLy= ew(s joo). 


@), Detily 





Verify that L x L =iL in spherical polar coordinates. L = —i(r x V), the quantum 
mechanical orbital angular momentum operator. 
Written in component form, this relation is 


Lighp Ig eihs,. Ighy=IgleSsly Dglg=Tolgoins 


Using the commutator notation, [A, B] = AB — BA, and the definition of the Levi- 
Civita symbol ¢;;;, the above can also be written 


[Li, Lj] =i eijx Le, 


where i, j, k are x, y, z in any order. 
Hint. Use spherical polar coordinates for L but Cartesian components for the cross 
product. 
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3.10.32 


3.10.33 


3.10.34 


3.10.35 


3.10.36 


(a) Using Eq. (3.156) show that 


; .(, 1 0, 90 
L=—i(r x V) =i | & —— — —€y— |}. 
sin@ dg 00 


(b) Resolving €, and @, into Cartesian components, determine Ly, Ly, and L, in 
terms of 0, y, and their derivatives. 


(c) From L? = ie + ie + i show that 
1 a a i 
i sin 
sin@ 00 00 sin’ 6 0g? 


7) a 
202 2 
Vr+ ; 
. or (: =) 


With L = —ir x V, verify the operator identities 








~ 0 rx 
(a) os ral 7”) 


5) 


a 
(b) wv (1tr2) =i x L. 
r 


Show that the following three forms (spherical coordinates) of V7 (r) are equivalent: 


(a) 








1d [ .dW(r)]. 1 da? wir) 2dr) 
aan dr |: (b) 7 dnl YO (©) dr2 r dr — 


The second form is particularly convenient in establishing a correspondence between 
spherical polar and Cartesian descriptions of a problem. 
A certain force field is given in spherical polar coordinates by 


2Pcos0 . P . 
5 +eo— sind, r>P/2. 
i r 





F=6, 


(a) Examine V x F to see if a potential exists. 

(b) Calculate ¢ F - dr for a unit circle in the plane 6 = 2/2. What does this indicate 
about the force being conservative or nonconservative? 

(c) If you believe that F may be described by F = —Vvy, find w. Otherwise simply 
state that no acceptable potential exists. 


(a) Show that A= —é, cot@/r is a solution of V x A=6@,/r?. 

(b) Show that this spherical polar coordinate solution agrees with the solution given 
for Exercise 3.9.5: 

im yz a XZ 

= €; ey ; 
roe +y) 7 r@?+y) 
Note that the solution diverges for 6 = 0, 2 corresponding to x, y = 0. 

(c) Finally, show that A= —égq sin@/r is a solution. Note that although this solution 
does not diverge (r # 0), it is no longer single-valued for all possible azimuth 
angles. 





A 
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3.10.37 An electric dipole of moment p is located at the origin. The dipole creates an electric 
potential at r given by 
pr 
4a gor? : 


Yr) = 


Find the electric field, E= —Vw atr. 


Additional Readings 


Borisenko, A. I., and I. E. Tarpov, Vector and Tensor Analysis with Applications. Englewood Cliffs, NJ: Prentice- 
Hall (1968), reprinting, Dover (1980). 


Davis, H. F., and A. D. Snider, Introduction to Vector Analysis, 7th ed. Boston: Allyn & Bacon (1995). 

Kellogg, O. D., Foundations of Potential Theory. Berlin: Springer (1929), reprinted, Dover (1953). The classic 
text on potential theory. 

Lewis, P. E., and J. P. Ward, Vector Analysis for Engineers and Scientists. Reading, MA: Addison-Wesley (1989). 


Margenau, H., and G. M. Murphy, The Mathematics of Physics and Chemistry, 2nd ed. Princeton NJ: Van Nos- 
trand (1956). Chapter 5 covers curvilinear coordinates and 13 specific coordinate systems. 


Marion, J. B., Principles of Vector Analysis. New York: Academic Press (1965). A moderately advanced presen- 
tation of vector analysis oriented toward tensor analysis. Rotations and other transformations are described 
with the appropriate matrices. 

Morse, P. M., and H. Feshbach, Methods of Theoretical Physics. New York: McGraw-Hill (1953). Chapter 5 
includes a description of several different coordinate systems. Note that Morse and Feshbach are not above 
using left-handed coordinate systems even for Cartesian coordinates. Elsewhere in this excellent (and diffi- 
cult) book there are many examples of the use of the various coordinate systems in solving physical problems. 
Eleven additional fascinating but seldom-encountered orthogonal coordinate systems are discussed in the sec- 
ond (1970) edition of Mathematical Methods for Physicists. 

Spiegel, M. R., Vector Analysis. New York: McGraw-Hill (1989). 

Tai, C.-T., Generalized Vector and Dyadic Analysis. Oxford: Oxford University Press (1966). 


Wrede, R. C., Introduction to Vector and Tensor Analysis. New York: Wiley (1963), reprinting, Dover (1972). 
Fine historical introduction. Excellent discussion of differentiation of vectors and applications to mechanics. 


CHAPTER 4 


TENSORS AND 
DIFFERENTIAL FORMS 


4.1 TENSOR ANALYSIS 


Introduction, Properties 


Tensors are important in many areas of physics, ranging from topics such as general relativ- 
ity and electrodynamics to descriptions of the properties of bulk matter such as stress (the 
pattern of force applied to a sample) and strain (its response to the force), or the moment 
of inertia (the relation between a torsional force applied to an object and its resultant angu- 
lar acceleration). Tensors constitute a generalization of quantities previously introduced: 
scalars and vectors. We identified a scalar as an quantity that remained invariant under 
rotations of the coordinate system and which could be specified by the value of a sin- 
gle real number. Vectors were identified as quantities that had a number of real compo- 
nents equal to the dimension of the coordinate system, with the components transforming 
like the coordinates of a fixed point when a coordinate system is rotated. Calling scalars 
tensors of rank 0 and vectors tensors of rank 1, we identify a tensor of rank n in a 
d-dimensional space as an object with the following properties: 


e [thas components labeled by n indices, with each index assigned values from | through 
d, and therefore having a total of d” components; 


e The components transform in a specified manner under coordinate transformations. 


The behavior under coordinate transformation is of central importance for tensor anal- 
ysis and conforms both with the way in which mathematicians define linear spaces and 
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with the physicist’s notion that physical observables must not depend on the choice of 
coordinate frames. 


Covariant and Contravariant Tensors 


In Chapter 3, we considered the rotational transformation of a vector A= A,@; + Az€2 + 
A363 from the Cartesian system defined by é; (i = 1, 2,3) into a rotated coordinate system 
defined by é;, with the same vector A then represented as A’ = A/é + A5é@, + Aje4. The 
components of A and A’ are related by 


Al = °@ -@ Aj, (4.1) 
j 


where the coefficients (€; - €;) are the projections of é; in the é; directions. Because the é; 
and the é; are linearly related, we can also write 


ax! 
Fs iA, (4.2) 
L » Ox; J 


i 





The formula of Eq. (4.2) corresponds to the application of the chain rule to convert the set 
Aj into the set A‘, and is valid for A; and A’ of arbitrary magnitude because both vectors 
depend linearly on their components. 

We have also previously noted that the gradient of a scalar g has in the unrotated Carte- 
sian coordinates the components (V¢@) ; = (0g/0x;)€;, meaning that in a rotated system 
we would have 





dp Ox; dQ 
Vo), = — ) af 4. 
(Vo); Ox; 7 Ga. Gas (3) 


showing that the gradient has a transformation law that differs from that of Eq. (4.2) in 
that dx//dx; has been replaced by 0x; /dx;. Remembering that these two expressions, if 
written in detail, correspond, respectively, to (dx//0x;)x, and (0x; /0x; xls where k runs 
over the index values other than that already in the denominator, and also noting that (in 
Cartesian coordinates) they are two different ways of computing the same quantity (the 
magnitude and sign of the projection of one of these unit vectors upon the other), we see 
that it was legitimate to identify both A and V¢ as vectors, as we did in Chapter 3. 
However, as the alert reader may note from the repeated insertion of the word 
“Cartesian,” the partial derivatives in Eqs. (4.2) and (4.3) are only guaranteed to be equal 
in Cartesian coordinate systems, and since there is sometimes a need to use non-Cartesian 
systems it becomes necessary to distinguish these two different transformation rules. Quan- 
tities transforming according to Eq. (4.2) are called contravariant vectors, while those 
transforming according to Eq. (4.3) are termed covariant. When non-Cartesian systems 
may be in play, it is therefore customary to distinguish these transformation properties by 
writing the index of a contravariant vector as a superscript and that of a covariant vector as 
a subscript. This means, among other things, that the components of the position vector r, 
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which is contravariant, must now be written (x!, x7, x3). Thus, summarizing, 





. ax’yi 
(A’y' = s a ; A! A, acontravariant vector, (4.4) 
j 
Al _ ax/ x A . i 
i > a(x) j , a covariant vector. ( 5) 


j 

It is useful to note that the occurrence of subscripts and superscripts is systematic; the 
free (i.e., unsummed) index i occurs as a superscript on both sides of Eq. (4.4), while it 
appears as a subscript on both sides of Eq. (4.5), if we interpret an upper index in the 
denominator as equivalent to a lower index. The summed index occurs once as upper 
and once as lower (again treating an upper index in the denominator as a lower index). 
A frequently used shorthand (the Einstein convention) is to omit the summation sign in 
formulas like Eqs. (4.4) and (4.5) and to understand that when the same symbol occurs 
both as an upper and a lower index in the same expression, it is to be summed. We will 
gradually back into the use of the Einstein convention, giving the reader warnings as we 
start to do so. 


Tensors of Rank 2 


Now we proceed to define contravariant, mixed, and covariant tensors of rank 2 by the 
following equations for their components under coordinate transformations: 


ay 














7 axk ax! 
: ay ox! 2» 
By = - Br, 4.6 
( j dX axk (x!)J 1 ( ) 


axk ax! 
C)ij = —— — Cy. 
(C )ij dX DON! aay 


Clearly, the rank goes as the number of partial derivatives (or direction cosines) in the 
definition: 0 for a scalar, 1 for a vector, 2 for a second-rank tensor, and so on. Each index 
(subscript or superscript) ranges over the number of dimensions of the space. The number 
of indices (equal to the rank of tensor) is not limited by the dimensionality of the space. We 
see that A“ is contravariant with respect to both indices, Cy, is covariant with respect to 
both indices, and Be transforms contravariantly with respect to the index k but covariantly 
with respect to the index /. Once again, if we are using Cartesian coordinates, all three 
forms of the tensors of second rank, contravariant, mixed, and covariant are the same. 

As with the components of a vector, the transformation laws for the components of a 
tensor, Eq. (4.6), cause its physically relevant properties to be independent of the choice 
of reference frame. This is what makes tensor analysis important in physics. The inde- 
pendence relative to reference frame (invariance) is ideal for expressing and investigating 
universal physical laws. 
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The second-rank tensor A (with components A“) may be conveniently represented by 
writing out its components in a square array (3 x 3 if we are in three-dimensional (3-D) 
space): 

All Al? A'3 
A= | A?! A” 4? |, (4.7) 
A?! A>2 A?3 


This does not mean that any square array of numbers or functions forms a tensor. The 
essential condition is that the components transform according to Eq. (4.6). 
We can view each of Eq. (4.6) as a matrix equation. For A, it takes the form 


(AV = >" SAS yi: or A’=SAS', (4.8) 
kl 
a construction that is known as a similarity transformation and is discussed in 


Section 5.6. 


In summary, tensors are systems of components organized by one or more indices that 
transform according to specific rules under a set of transformations. The number of 
indices is called the rank of the tensor. 


Addition and Subtraction of Tensors 
The addition and subtraction of tensors is defined in terms of the individual elements, just 
as for vectors. If 
A+B=C, (4.9) 
then, taking as an example A, B, and C to be contravariant tensors of rank 2, 
AV 4+ BY =C, (4.10) 


In general, of course, A and B must be tensors of the same rank (of both contra- and 
co-variance) and in the same space. 


Symmetry 
The order in which the indices appear in our description of a tensor is important. In general, 
A’” is independent of A””, but there are some cases of special interest. If, for all m and n, 
A™ = A", Ais symmetric. (4.11) 
If, on the other hand, 
A™ =—A”™", Ais antisymmetric. (4.12) 


Clearly, every (second-rank) tensor can be resolved into symmetric and antisymmetric 
parts by the identity 


1 1 
am = (am + A™) + Scam — Am), (4.13) 


the first term on the right being a symmetric tensor, the second, an antisymmetric tensor. 
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Isotropic Tensors 


To illustrate some of the techniques of tensor analysis, let us show that the now-familiar 
Kronecker delta, 5,7, is really a mixed tensor of rank 2, oH .| The question is: Does 5 trans- 
form according to Eq. (4.6)? This is our criterion for calling it a tensor. If sf is the mixed 
tensor corresponding to this notation, it must satisfy (using the summation convention, 
meaning that the indices k and / are to be summed) 





i= OG OR) x _ a(x’)! axk 
J axk aceyi '  axk a(x)’ 
where we have performed the / sum and used the definition of the Kronecker delta. Next, 
dGy oa" OG) 
axk Aa(x’)J — A(x!)s’ 
where we have identified the k summation on the left-hand side as an instance of the 
chain rule for differentiation. However, (x’)' and (x’)/ are independent coordinates, and 


therefore the variation of one with respect to the other must be zero if they are different, 
unity if they coincide; that is, 











6s 
Ian = © i. (4.14) 
Hence 
inp OGY On oy 
6), = axk al’ (4.15) 


showing that the 5k are indeed the components of a mixed second-rank tensor. Note that 
this result is independent of the number of dimensions of our space. 

The Kronecker delta has one further interesting property. It has the same components in 
all of our rotated coordinate systems and is therefore called isotropic. In Section 4.2 and 
Exercise 4.2.4 we shall meet a third-rank isotropic tensor and three fourth-rank isotropic 
tensors. No isotropic first-rank tensor (vector) exists. 


Contraction 


When dealing with vectors, we formed a scalar product by summing products of corre- 
sponding components: 


A-B=)°AjB;. 
i 


The generalization of this expression in tensor analysis is a process known as contraction. 
Two indices, one covariant and the other contravariant, are set equal to each other, and then 
(as implied by the summation convention) we sum over this repeated index. For example, 





lt is common practice to refer to a tensor A by specifying a typical component, such as A;;, thereby also conveying information 


as to its covariant vs. contravariant nature. As long as you refrain from writing nonsense such as A = Aj;, no harm is done. 
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let us contract the second-rank mixed tensor Bi by setting 7 to i, then summing over 7. To 


see what happens, let’s look at the transformation formula that converts B into B’. Using 
the summation convention, 


a(x’)! dx! x _ ox! 
axk aay! axk 


where we recognized the i summation as an instance of the chain rule for differentiation. 
Then, because the x’ are independent, we may use Eq. (4.14) to reach 


(B’)! = 61 BK = BE. (4.16) 





i k 
(B’); = B; , 


Remembering that the repeated index (i or k) is summed, we see that the contracted B 
is invariant under transformation and is therefore a scalar.* In general, the operation of 
contraction reduces the rank of a tensor by 2. 


Direct Product 


The components of two tensors (of any ranks and covariant/contravariant characters) can 
be multiplied, component by component, to make an object with all the indices of both 
factors. The new quantity, termed the direct product of the two tensors, can be shown to be 
a tensor whose rank is the sum of the ranks of the factors, and with covariant/contravariant 
character that is the sum of those of the factors. We illustrate: 

Chim = AL Bins Fy =A! Bix: 


Note that the index order in the direct product can be defined as desired, but the covari- 
ance/contravariance of the factors must be maintained in the direct product. 


Example 4.1.1 DIRECT PRODUCT OF TWO VECTORS 


Let’s form the direct product of a covariant vector a; (rank-1 tensor) and a contravariant 
vector b/ (also a rank-1 tensor) to form a mixed tensor of rank 2, with components C i = 
a;b/. To verify that C/ is a tensor, we consider what happens to it under transformation: 
axk (x')/ ax* a(x’)! 
ae : (= : om (4.17) 
0(x')j Ox d(x’); Ox 





(Cc)! =@); 6 = 


confirming that C H is the mixed tensor indicated by its notation. 

If we now form the contraction C : (remember that i is summed), we obtain the scalar 
product a;b'. From Eq. (4.17) it is easy to see that a;b' = (a’);(b’)', indicating the invari- 
ance required of a scalar product. | 


Note that the direct product concept gives a meaning to quantities such as VE, which 
was not defined within the framework of vector analysis. However, this and other tensor- 
like quantities involving differential operators must be used with caution, because their 





2Tm matrix analysis this scalar is the trace of the matrix whose elements are the Bi. 
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transformation rules are simple only in Cartesian coordinate systems. In non-Cartesian 
systems, operators 0/4x! act also on the partial derivatives in the transformation expres- 
sions and alter the tensor transformation rules. 

We summarize the key idea of this subsection: 


The direct product is a technique for creating new, higher-rank tensors. 


Inverse Transformation 


If we have a contravariant vector A’, which must have the transformation rule (using sum- 
mation convention) 


(A)! = A‘, 


A(x’) 
ax! 
the inverse transformation (which can be obtained simply by interchanging the roles of the 

primed and unprimed quantities) is 
ax! 


—_ I\j 
Ai= iG (A’)/, (4.18) 





as may also be verified by applying 0(x’)*/dx! (and summing 7) to A! as given by 
Eq. (4.18): 





Cy a eer ae 
a a aot A) = 8)(A' = (AY. (4.19) 


We see that (A’)* is recovered. Incidentally, note that 


axi faa’ 
O(x’)4 | ax! , 





as we have previously pointed out, these derivatives have different other variables held 
fixed. The cancellation in Eq. (4.19) only occurs because the product of derivatives is 
summed. In Cartesian systems, we do have 
ax! A(X’ 
a(x’)i Ax! 
both equal to the direction cosine connecting the x! and (x’)/ axes, but this equality does 
not extend to non-Cartesian systems. 


Quotient Rule 


If, for example, A;; and By; are tensors, we have already observed that their direct product, 
Aj; Bxi, is also a tensor. Here we are concerned with the inverse problem, illustrated by 
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equations such as 


K,;A' =B, 
K A; = Bi, 
Ki Ajx = Bik, (4.20) 
Kijx AY = Bu, 
Ki AR = Bik, 


In each of these expressions A and B are known tensors of ranks indicated by the number 
of indices, A is arbitrary, and the summation convention is in use. In each case K is an 
unknown quantity. We wish to establish the transformation properties of K. The quotient 
tule asserts: 


If the equation of interest holds in all transformed coordinate systems, then K is a tensor 
of the indicated rank and covariant/contravariant character. 


Part of the importance of this rule in physical theory is that it can establish the tensor 
nature of quantities. For example, the equation giving the dipole moment m induced in an 
anisotropic medium by an electric field E is 


mj = PE! 


Since presumably we know that m and E are vectors, the general validity of this equation 
tells us that the polarization matrix P is a tensor of rank 2. 

Let’s prove the quotient rule for a typical case, which we choose to be the second of 
Eqs. (4.20). If we apply a transformation to that equation, we have 


Kj} Aj =B; —_ (K")/ A, = By. G21) 


We now evaluate B/, reaching the last member of the equation below by using Eq. (4.18) 
to convert A; into components of A’ (note that this is the inverse of the transformation to 
the primed quantities): 
axl ax . ax - A(x!)” 
Be! pl ee og 
a(x’)! a(x’)! “ A(x’)! Ox! 
It may lessen possible confusion if we rename the dummy indices in Eq. (4.22), so we 
interchange n and j, causing that equation to then read 


je OR BO ash 3 
i a(x’) axn m**j° 











(4.22) 








(4.23) 


It has now become clear that if we subtract the expression for B/ in Eq. (4.23) from that in 
Eq. (4.21) we will get 








; Ox” A(x!) 
I) J n (8. 
G = A(x!) Ox” Kin Aj =O. (4.24) 
Since A’ is arbitrary, the coefficient of A‘ in Eq. (4.24) must vanish, showing that K has 
the transformation properties of the tensor corresponding to its index configuration. 
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Other cases may be treated similarly. One minor pitfall should be noted: The quotient 
rule does not necessarily apply if B is zero. The transformation properties of zero are 
indeterminate. 


Example 4.1.2 EQUATIONS OF MOTION AND FIELD EQUATIONS 


In classical mechanics, Newton’s equations of motion mv = F tell us on the basis of the 
quotient rule that, if the mass is a scalar and the force a vector, then the acceleration a= v 
is a vector. In other words, the vector character of the force as the driving term imposes its 
vector character on the acceleration, provided the scale factor m is scalar. 

The wave equation of electrodynamics can be written in relativistic four-vector form as 


1 3 
[aaa Aras" 
Cc 


where J“ is the external charge/current density (a four-vector) and A” is the four- 
component vector potential. The second-derivative expression in square brackets can be 


shown to be a scalar. From the quotient rule, we may then infer that A“ must be a tensor 
of rank 1, i.e., also a four-vector. | 


The quotient rule is a substitute for the illegal division of tensors. 


Spinors 


It was once thought that the system of scalars, vectors, tensors (second-rank), and so on 
formed a complete mathematical system, one that is adequate for describing a physics 
independent of the choice of reference frame. But the universe and mathematical physics 
are not that simple. In the realm of elementary particles, for example, spin-zero particles* 
(x mesons, @ particles) may be described with scalars, spin 1 particles (deuterons) by 
vectors, and spin 2 particles (gravitons) by tensors. This listing omits the most common 
particles: electrons, protons, and neutrons, all with spin 5 These particles are properly 
described by spinors. A spinor does not have the properties under rotation consistent with 
being a scalar, vector, or tensor of any rank. A brief introduction to spinors in the context 
of group theory appears in Chapter 17. 


Exercises 


4.1.1 


Show that if all the components of any tensor of any rank vanish in one particular 
coordinate system, they vanish in all coordinate systems. 

Note. This point takes on special importance in the four-dimensional (4-D) curved space 
of general relativity. If a quantity, expressed as a tensor, exists in one coordinate sys- 
tem, it exists in all coordinate systems and is not just a consequence of a choice of a 
coordinate system (as are centrifugal and Coriolis forces in Newtonian mechanics). 


3The particle spin is intrinsic angular momentum (in units of fA). It is distinct from classical (often called orbital) angular 
momentum that arises from the motion of the particle. 
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4.1.2 


4.1.3 


4.1.4 


4.1.5 


4.1.6 


4.1.7 


4.1.8 


4.1.9 
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The components of tensor A are equal to the corresponding components of tensor B in 
one particular coordinate system denoted, by the superscript 0; that is, 


0 0 
Ajj = Bj. 
Show that tensor A is equal to tensor B, A;; = B;;, in all coordinate systems. 


The last three components of a 4-D vector vanish in each of two reference frames. If the 
second reference frame is not merely a rotation of the first about the x9 axis, meaning 
that at least one of the coefficients 3(x’)!/dx° (i = 1, 2,3) is nonzero, show that the 
zeroth component vanishes in all reference frames. Translated into relativistic mechan- 
ics, this means that if momentum is conserved in two Lorentz frames, then energy is 
conserved in all Lorentz frames. 


From an analysis of the behavior of a general second-rank tensor under 90° and 180° 
rotations about the coordinate axes, show that an isotropic second-rank tensor in 3-D 
space must be a multiple of 5}. 


The 4-D fourth-rank Riemann-Christoffel curvature tensor of general relativity, Rixim, 
satisfies the symmetry relations 


Rikim = —Rikmt = —Rkiim.- 


With the indices running from 0 to 3, show that the number of independent components 
is reduced from 256 to 36 and that the condition 


Riktm = Rimik 


further reduces the number of independent components to 21. Finally, if the components 
satisfy an identity Rixim + Ritmk + Rime = 0, show that the number of independent 
components is reduced to 20. 

Note. The final three-term identity furnishes new information only if all four indices are 
different. 


Tikim 18 antisymmetric with respect to all pairs of indices. How many independent com- 
ponents has it (in 3-D space)? 


If T_; is a tensor of rank n, show that T7_;/dx/ is a tensor of rank n + 1 (Cartesian 
coordinates). 

Note. In non-Cartesian coordinate systems the coefficients a;; are, in general, functions 
of the coordinates, and the derivatives the components of a tensor of rank n do not 
form a tensor except in the special case n = 0. In this case the derivative does yield a 
covariant vector (tensor of rank 1). 


If Tjjx... is a tensor of rank n, show that oF AT; jx... /0x! is a tensor of rank n — 1 
(Cartesian coordinates). 


The operator 


2 1 
c2 at 
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may be written as 
v2 
—, 
a Ox; 
using x4 = ict. This is the 4-D Laplacian, sometimes called the d’Alembertian and 


denoted by 0”. Show that it is a sealar operator, that is, invariant under Lorentz trans- 
formations, i.e., under rotations in the space of vectors (e", x2, x3, x), 














4.1.10 | The double summation K, pA! B/ is invariant for any two vectors A’ and B/. Prove that 
Kj; is a second-rank tensor. 


Note. In the form ds? (invariant) = g; i dx! dx/, this result shows that the matrix ij iS 
a tensor. 


4.1.11 The equation K, ip AJ ke BK holds for all orientations of the coordinate system. If A and 
B are arbitrary second-rank tensors, show that K is a second-rank tensor also. 


4.2 PSEUDOTENSORS, DUAL TENSORS 


The topics of this section will be treated for tensors restricted for practical reasons to Carte- 
sian coordinate systems. This restriction is not conceptually necessary but simplifies the 
discussion and makes the essential points easy to identify. 


Pseudotensors 


So far the coordinate transformations in this chapter have been restricted to passive rota- 
tions, by which we mean rotation of the coordinate system, keeping vectors and tensors at 
fixed orientations. We now consider the effect of reflections or inversions of the coordinate 
system (sometimes also called improper rotations). 

In Section 3.3, where attention was restricted to orthogonal systems of Cartesian coor- 
dinates, we saw that the effect of a coordinate rotation on a fixed vector could be described 
by a transformation of its components according to the formula 


A’ =SA, (4.25) 


where S was an orthogonal matrix with determinant +1. If the coordinate transformation 
included a reflection (or inversion), the transformation matrix was still orthogonal, but 
had determinant —1. While the transformation rule of Eq. (4.25) was obeyed by vectors 
describing quantities such as position in space or velocity, it produced the wrong sign 
when vectors describing angular velocity, torque, and angular momentum were subject to 
improper rotations. These quantities, called axial vectors, or nowadays pseudovectors, 
obeyed the transformation rule 


A’ =det(S)SA_ (pseudovector). (4.26) 


The extension of this concept to tensors is straightforward. We insist that the designation 
tensor refer to objects that transform as in Eq. (4.6) and its generalization to arbitrary 
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rank, but we also accommodate the possibility of having, at arbitrary rank, objects whose 
transformation requires an additional sign factor to adjust for the effect associated with 
improper rotations. These objects are called pseudotensors, and constitute a generalization 
of the objects already identified as pseudoscalars and pseudovectors. 

If we form a tensor or pseudotensor as a direct product or identify one via the quotient 
rule, we can determine its pseudo status by what amounts to a sign rule. Letting T be a 
tensor and P a pseudotensor, then, symbolically, 


T@®T=P@P=T, T@P=P@T=P. (4.27) 


Example 4.2.1 Levi-Civita SYMBOL 


The three-index version of the Levi-Civita symbol, introduced in Eq. (2.8), has the values 


£123 = €231 =€3122 = +1, 
€132 = €213 = €321 = —l, (4.28) 
all other ¢; jx = 0. 


Suppose now that we have a rank-3 pseudotensor nj; jx, which in one particular Cartesian 
coordinate system is equal to ¢;;,. Then, letting A stand for the matrix of coefficients in an 
orthogonal transformation of IR*, we have in the transformed coordinate system 


Mik = Aet(A) Y | dipd jqAkr€ pqr a) 
par 


by definition of pseudotensor. All terms of the pgr sum will vanish except those where 
par is a permutation of 123, and when pqr is such a permutation the sum will correspond 
to the determinant of A except that its rows will have been permuted from 123 to ijk. This 
means that the pgr sum will have the value ¢;;, det(A), and 


Niik = €ijk [det(A)]* = Eijk, (4.30) 


where the final result depends on the fact that | det(A)| = 1. If the reader is uncomfortable 
with the above analysis, the result can be checked by enumeration of the contributions of 
the six permutations that correspond to nonzero values of 17; ;,. 

Equation (4.30) not only shows that ¢ is a rank-3 pseudotensor, but that it is also 
isotropic. In other words, it has the same components in all rotated Cartesian coordinate 
systems, and —1 times those component values in all Cartesian systems that are reached 
by improper rotations. = 


Dual Tensors 


With any antisymmetric second-rank tensor C (in 3-D space) we may associate a pseu- 
dovector C with components defined by 


Ci = Feige. (4.31) 
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In matrix form the antisymmetric C may be written 


0 c!2 —C3! 
c=|-c”® 0 Cc |. (4.32) 
C3! —~C2 0 


We know that C; must transform as a vector under rotations because it was obtained from 
the double contraction of ¢;;,C/ k but that it is really a pseudovector because of the pseudo 
nature of ¢;;,. Specifically, the components of C are given by 


(Ci, Co, C3) = (C*, C7, c%), (4.33) 


Note the cyclic order of the indices that comes from the cyclic order of the components 
of Ej jx. 

We identify the pseudovector of Eq. (4.33) and the antisymmetric tensor of Eq. (4.32) 
as dual tensors; they are simply different representations of the same information. Which 
of the dual pair we choose to use is a matter of convenience. 

Here is another example of duality. If we take three vectors A, B, and C, we may define 
the direct product 


Vik — ai Bick, (4.34) 
Vk is evidently a rank-3 tensor. The dual quantity 
V= eijnV I (4.35) 
is clearly a pseudoscalar. By expansion it is seen that 
A! B! c! 
V =| A? B2C? (4.36) 
A’ BC 


is our familiar scalar triple product. 


Exercises 


4.2.1 


4.2.2 


An antisymmetric square array is given by 
0 C3 -C) oO: co ee 
—-C; 0 C, J}=]-c? 0 cP], 
Cr -C; 0 -c¥-c? 0 


where (C1, C2, C3) form a pseudovector. Assuming that the relation 
1 ak 
Ci = 7 eijhkCl* 
holds in all coordinate systems, prove that C/* is a tensor. (This is another form of the 
quotient theorem.) 


Show that the vector product is unique to 3-D space, that is, only in three dimensions can 
we establish a one-to-one correspondence between the components of an antisymmetric 
tensor (second-rank) and the components of a vector. 
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4.2.3 


4.2.4 


4.2.5 


4.2.6 


4.2.7 


Chapter 4 Tensors and Differential Forms 


Write V-V x A and V x V¢q in tensor (index) notation in JR? so that it becomes obvious 
that each expression vanishes. 

a 0 
ox! Ox/ 

a0 

(V x Vo)i = ijk =F ayk? 
Verify that each of the following fourth-rank tensors is isotropic, that is, that it has the 
same form independent of any rotation of the coordinate systems. 


(a) Aik = dist, 


ANS. VV xX A= &ijx 


(b) Bi = sis) + sis/, 

(c) Cy =sia/ — il. 

Show that the two-index Levi-Civita symbol ¢;; is a second-rank pseudotensor (in two- 
dimensional [2-D] space). Does this contradict the uniqueness of 5 (Exercise 4.1.4)? 


Represent ¢;; by a 2 x 2 matrix, and using the 2 x 2 rotation matrix of Eq. (3.23), show 
that ¢;; 1s invariant under orthogonal similarity transformations. 


Given A; = 5 Eijk Bi with BV =—Bi' , antisymmetric, show that 


Bea lnk Ay, 


4.3. TENSORS IN GENERAL COORDINATES 


Metric Tensor 


The distinction between contravariant and covariant transformations was established in 
Section 4.1, where we also observed that it only became meaningful when working with 
coordinate systems that are not Cartesian. We now want to examine relationships that can 
systematize the use of more general metric spaces (also called Riemannian spaces). Our 
initial illustrations will be for spaces with three dimensions. 

Letting g' denote coordinates in a general coordinate system, writing the index as a 
superscript to reflect the fact that coordinates transform contravariantly, we define covari- 
ant basis vectors e; that describe the displacement (in Euclidean space) per unit change 
in q', keeping the other q/ constant. For the situations of interest here, both the direction 
and magnitude of ¢; may be functions of position, so it is defined as the derivative 

fee Ee ee, (4.37) 
dq' dq' ~ dq! 
An arbitrary vector A can now be formed as a linear combination of the basis vectors, 
multiplied by coefficients: 





A=A! e+ A? 62+ Are3. (4.38) 
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At this point we have a linguistic ambiguity: A is a fixed object (usually called a vector) 
that may be described in various coordinate systems. But it is also customary to call the 
collection of coefficients A’ a vector (more specifically, a contravariant vector), while 
we have already called e; a covariant basis vector. The important thing to observe here is 
that A is a fixed object that is not changed by our transformations, while its representation 
(the A’) and the basis used for the representation (the e;) change in mutually inverse ways 
(as the coordinate system is changed) so as to keep A fixed. 

Given our basis vectors, we can compute the displacement (change in position) associ- 
ated with changes in the q’. Because the basis vectors depend on position, our computation 
needs to be for small (infinitesimal) displacements ds. We have 


(ds) =) (ei dq')- (ej dq!), 
ij 


which, using the summation convention, can be written 
(ds)” = gijdq'dq!, (4.39) 
with 
Bij = Fi Ej. (4.40) 


Since (ds)* is an invariant under rotational (and reflection) transformations, it is a scalar, 
and the quotient rule permits us to identify g;; as a covariant tensor. Because of its role in 
defining displacement, g;; is called the covariant metric tensor. 

Note that the basis vectors can be defined by their Cartesian components, but they are, in 
general, neither unit vectors nor mutually orthogonal. Because they are often not unit vec- 
tors we have identified them by the symbol ¢, not é. The lack of both a normalization and 
an orthogonality requirement means that g;;, though manifestly symmetric, is not required 
to be diagonal, and its elements (including those on the diagonal) may be of either sign. 

It is convenient to define a contravariant metric tensor that satisfies 


Bi ej = gies” = 55, (4.41) 
and is therefore the inverse of the covariant metric tensor. We will use g;; and g'/ to make 
conversions between contravariant and covariant vectors that we then regard as related. 
Thus, we write 

gijFi =F; and gi Fj) =F'. (4.42) 
Returning now to Eq. (4.38), we can manipulate it as follows: 


A= Al ej = Al 5f ex = (Agi) (g/*ex) = Ape’, (4.43) 


showing that the same vector can be represented either by contravariant or covariant com- 
ponents, with the two sets of components related by the transformation in Eq. (4.42). 
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Covariant and Contravariant Bases 


We now define the contravariant basis vectors 


. ogi,  dqi, . dqi, 
be 4.44 
é a ay Oe ge ( ) 





giving them this name in anticipation of the fact that we can prove them to be the con- 
travariant versions of the e;. Our first step in this direction is to verify that 
_ agi dx dqi dy dq! az j 


= - : -=6., 4.45 
#1 Ox dqi ay Ogi = az Agi Ss! er) 





a consequence of the chain rule and the fact that g' and q/ are independent variables. 
We next note that 


(e' -e/)(e; ex) =5,, (4.46) 


also proved using the chain rule; the terms can be collected so that groups of them corre- 
spond to the identities in Eq. (4.45). Equation (4.46) shows that 
gi =e!-e/, (4.47) 


Multiplying both sides of Eq. (4.47) on the right by e; and performing the implied sum- 
mation, the left-hand side of that equation, g'/ e;, becomes the formula for e’, while the 
right-hand side simplifies to the expression in Eq. (4.44), thereby proving that the con- 
travariant vector in that equation was appropriately named. 

We illustrate now some typical metric tensors and basis vectors in both covariant and 
contravariant form. 


Example 4.3.1. Some METRIC TENsoRS 


In spherical polar coordinates, (q!, q7, q*) = (r, 0, v), and x =r sin@ cosy, y =rsin@ sing, 
z=rcos@. The covariant basis vectors are 


&, = sind cosy e, + sind sing éy + cos €:, 
&9 =rcos@ cosy e, +r cosé sing €y —r sind é,, 
€y = —rsin@ singe; +r sin cosg ey, 


and the contravariant basis vectors, which can be obtained in many ways, one of which is 
to start from r? = x* + y* +2”, cos@ = z/r, tang = y/x, are 
e’ =sinO cosy e, + sind sing é, + cos €,, 


e’ =r 'coscosyé, +r! cos sing €y — r_| sind é, 


sing , cOos@g , 
ef = 





: ey : ey, 
rsin@ rsind >’ 
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leading to 
gue -é-=1, 
= =. 22 
822 = & Eg =r, 
833 = Ey (Eg= r? sin? 0; 


all other g;; vanish. Combining these to make g;; and taking the inverse (to make g'/), we 
have 


1 0 0 - 1 0 0 
(gj)=(9 7? O° J, (g¥)=[0 r* 0 
0 0. r?sin?6 0 0  (rsin@)~? 


We can check that we have inverted gi; correctly by comparing the expression given for 
g'/ from that built directly from e! - e/. This check is left for the reader. 
The Minkowski metric of special relativity has the form 


1 0 0 0 

7 0 -l 0 0 
-)y)=(o/) = 

(gif) =(g") = 0 G@ =1 0 

0 0 0 -l 


The motivation for including it in this example is to emphasize that for some met- 
rics important in physics, distances ds? need not be positive (meaning that ds can be 
imaginary). a 


The relation between the covariant and contravariant basis vectors is useful for writing 
relationships between vectors. Let A and B be vectors with contravariant representations 
(A!) and (B'). We may convert the representation of B to B; = g;; B/, after which the 
scalar product A - B takes the form 


A- B= (A'e;)- (Bj e/) = A'B;(e; -e/) = A'B;. (4.48) 


Another application is in writing the gradient in general coordinates. If a function y is 
given in a general coordinate system (q'), its gradient Vy is a vector with Cartesian com- 
ponents 








aw dq! 
Vv¥y),;=— —. 4.49 
(Wi = 50 axl (4.49) 
In vector notation, Eq. (4.49) becomes 
0 : 
Vy= Vg (4.50) 
0q' 


showing that the covariant representation of Vy is the set of derivatives dy/dq'. If we 
have reason to use a contravariant representation of the gradient, we can convert its com- 
ponents using Eq. (4.42). 
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Covariant Derivatives 


Moving on to the derivatives of a vector, we find that the situation is much more compli- 
cated because the basis vectors e; are in general not constant, and the derivative will not 
be a tensor whose components are the derivatives of the vector components. 

Starting from the transformation rule for a contravariant vector, 


xt 
(v’y! ee iay y. 
4k 
and differentiating with respect to q/, we get (for each i) 
a(v’)' ax! av A Ox! yk 
Ogi qx qi AgIagk 
which appears to differ from the transformation law for a second-rank tensor because it 
contains a second derivative. 
To see what to do next, let’s write Eq. (4.51) as a single vector equation in the x; coor- 
dinates, which we take to be Cartesian. The result is 
dv’ _ avi «98k 
= SE —— 
dq/ dq! dq/ 
We now recognize that de, /dq/ must be some vector in the space spanned by the set of 
all e; and we therefore write 





(4.51) 





(4.52) 


Oek im 
The quantities ri, , are known as Christoffel symbols of the second kind (those of the first 
kind will be encountered shortly). Using the orthogonality property of the e, Eq. (4.45), 
we can solve Eq. (4.53) by taking its dot product with any e””, reaching 
dex 
m_ gm SSK 
Mie =e agi (4.54) 
Moreover, we note that Vi = Dike which can be demonstrated by writing out the compo- 


nents of de,/dq/. 
Returning now to Eq. (4.52) and inserting Eq. (4.53), we initially get 








av’ avk a 
agi = agi ext Vien: (4.55) 
Interchanging the dummy indices k and wy in the last term of Eq. (4.55), we get the final 
result 
av’ (avk 
= upk 
agi (= +V ri.) Ek. (4.56) 


The parenthesized quantity in Eq. (4.56) is known as the covariant derivative of V, and 
it has (unfortunately) become standard to identify it by the awkward notation 
avk av’ 
kk _4 perk = vk 
= agi $VCT is 80 agi Ve; ek. (4.57) 
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If we rewrite Eq. (4.56) in the form 
dV’ = [ vi dq! | bi 


and take note that dq/ is a contravariant vector, while ex, is covariant, we see that the 
covariant derivative, ve is a mixed second-rank tensor.* However, it is important to realize 
that although they bristle with indices, neither 0V“/dq/ nor T a have individually the 
correct transformation properties to be tensors. It is only the combination in Eq. (4.57) that 
has the requisite transformational attributes. 
It can be shown (see Exercise 4.3.6) that the covariant derivative of a covariant vector 
V; is given by 
OV; 
Visi = ; 
oq! 





= Vik. (4.58) 


Like Vis Vj. ; is a second-rank tensor. 


The physical importance of the covariant derivative is that it includes the changes in the 
basis vectors pursuant to a general dq’, and is therefore more appropriate for describing 
physical phenomena than a formulation that considers only the changes in the coefficients 
multiplying the basis vectors. 


Evaluating Christoffel Symbols 


It may be more convenient to evaluate the Christoffel symbols by relating them to the 
metric tensor than simply to use Eq. (4.54). As an initial step in this direction, we define 
the Christoffel symbol of the first kind [ij, k] by 


lij, kj= &mk Vi (4.59) 


from which the symmetry [ij, k] = [ji, k] follows. Again, this [ij, k] is not a third-rank 
tensor. Inserting Eq. (4.54) and applying the index-lowering transformation, Eq. (4.42), 








we have 
08; 
[i KI = Bmx e” 5 
0 5 
i (4.60) 
dq/ 


Next, we write gj; = @; - €; as in Eq. (4.40) and differentiate it, identifying the result with 
the aid of Eq. (4.60): 

98) — 083 (ej + ej 08) 

agk = agk © 


= [ik, j]+ (jk, i]. 








4V’ does not contribute to the covariant/contravariant character of the equation as its implicit index labels the Cartesian coordi- 
nates, as is also the case for ex. 
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We then note that we can combine three of these derivatives with different index sets, with 
a result that simplifies to give 


1] Ogik | O8je 9Bij 
2[aqi  daqi —agk 
We now return to Eq. (4.59), which we solve for ri by multiplying both sides by 





= (ij, K). (4.61) 
g”* summing over k, and using the fact that (guv) and (g*”) are mutually inverse, see 


Eq. (4.41): 
re= >) ei ,%). (4.62) 
k 


Finally, substituting for [ij, k] from Eq. (4.61), and once again using the summation con- 
vention, we get: 





Ogi Ogjk Oi; 
BF oe ee |, (4.63) 
dq! dq! oq 

The apparatus of this subsection becomes unnecessary in Cartesian coordinates, because 
the basis vectors have vanishing derivatives, and the covariant and ordinary partial deriva- 


tives then coincide. 


1 
ras" li. k= 58" 


Tensor Derivative Operators 


With covariant differentiation now available, we are ready to derive the vector differential 
operators in general tensor form. 


Gradient—We have already discussed it, with the result from Eq. (4.50): 


ow ; 


Vy= gfe (4.64) 





Divergence—A vector V whose contravariant representation is V'e; has divergence 
a(V'e;) 


V-V=e/ 
aq/ 











ef OV" av! 
el. (~ + vir) eT + Vv" Ti,. (4.65) 


Note that the covariant derivative has appeared here. Expressing I x by Eq. (4.63), we 
have 





O8im O8km see | = 1 im I8im (4.66) 


— im 
Vik = 78 Ee agi agm™| 2° age’ 


where we have recognized that the last two terms in the bracket will cancel because by 
changing the names of their dummy indices they can be identified as identical except in 
sign. 

Because (g!””) is the matrix inverse to (g;m), we note that the combination of matrix 
elements on the right-hand side of Eq. (4.66) is similar to those in the formula for the 
derivative of a determinant, Eq. (2.35); remember that g is symmetric: g'” = g’”’. In the 
present notation, the relevant formula is 


ddet(g) 
dqé 


O8im 
ag* 








= det(g) gi” , (4.67) 
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where det(g) is the determinant of the covariant metric tensor (g,,,). Using Eq. (4.67), 
Eq. (4.66) becomes 


; 1 ddet(g)  —-1_——safdet(g)J!/? 
Vik = J aet(g) dq’ — [det(g)]"/2, gk ee 





Combining the result in Eq. (4.68) with Eq. (4.65), we obtain a maximally compact formula 
for the divergence of a contravariant vector V: 


1 3 
V-V=aVi= (deCgy VE ag ( ldett@o1!? v*) (4.69) 


To compare this result with that for an orthogonal coordinate system, Eq. (3.141), note that 
det(g) = (hy hzh3)* and that the k component of the vector represented by V in Eq. (3.141) 
is, in the present notation, equal to V¥ lex; = hy VE (no summation). 


Laplacian—We can form the Laplacian V7 by inserting an expression for the gradi- 
ent Vw into the formula for the divergence, Eq. (4.69). However, that equation uses the 
contravariant coefficients V*, so we must describe the gradient in its contravariant rep- 
resentation. Since Eq. (4.64) shows that the covariant coefficients of the gradient are the 
derivatives dy/dq', its contravariant coefficients have to be 





i OW 
aqi- 
Insertion into Eq. (4.69) then yields 
1 0 . ow 
Vera —_. ___| [det(g)]'/7 2" ——). 4.70 
Y= apie ma et(g)]!/7 a) (4.70) 


For orthogonal systems the metric tensor is diagonal and the contravariant g!’ = (h;)~? 
(no summation). Equation (4.70) then reduces to 


V-Vy= 1 0 [hih2h3 ow 
7 hyhzh3 aq! he aqi }? 


in agreement with Eq. (3.142). 





Curl—tThe difference of derivatives that appears in the curl has components that can be 
written 
ME ye 
dqi agi —s aq! J ag! 





+ VC = Vij — Visis (4.71) 


where we used the symmetry of the Christoffel symbols to obtain a cancellation. The rea- 
son for the manipulation in Eq. (4.71) is to bring all the terms on its right-hand side to 
tensor form. In using Eq. (4.71), it is necessary to remember that the quantities V; are coef- 
ficients of the possibly nonunit e! and are therefore not components of V in the orthonor- 
mal basis é;. 
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Exercises 


4.3.1 


4.3.2 


4.3.3 
4.3.4 
4.3.5 


4.3.6 


4.3.7 


4.3.8 


4.3.9 


4.3.10 
4.3.11 


4.3.12 


For the special case of 3-D space (€1, €2, €3 defining a right-handed coordinate system, 
not necessarily orthogonal), show that 


g = ej xX €k 


= ——_— .,_ i, j,k = 1, 2, 3 and cyclic permutations. 
ej xX ER: Ej 


Note. These contravariant basis vectors e! define the reciprocal lattice space of 
Example 3.2.1. 


If the covariant vectors e; are orthogonal, show that 
(a) gi; is diagonal, 

(b) g!! =1/g;; (no summation), 

(c) Je'|=1/leil. 

Prove that (e! -e/)(6; -ER)= OL. 

Show that Mi = ee 


Derive the covariant and contravariant metric tensors for circular cylindrical coordi- 
nates. 


Show that the covariant derivative of a covariant vector is given by 


_ OV; k 
Vij = agi _ VT. 
Hint. Differentiate 
e-ej= 5}. 


Verify that Vj. ; = gix ve by showing that 


OV; 
aqi 





avi 
— Vi, = Bik E + vers, | ; 


From the circular cylindrical metric tensor g;;, calculate the ri, for circular cylindrical 
coordinates. , 


Note. There are only three nonvanishing I’. 


Using the ee from Exercise 4.3.8, write out the covariant derivatives vie of a vector V 
in circular cylindrical coordinates. , 


Show that for the metric tensor g;j.4 = gl =0. 


Starting with the divergence in tensor notation, Eq. (4.70), develop the divergence of a 
vector in spherical polar coordinates, Eq. (3.157). 


The covariant vector A; is the gradient of a scalar. Show that the difference of covariant 
derivatives Aj;.; — Aj.; vanishes. 
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4.4 JACOBIANS 


In the preceding chapters we have considered the use of curvilinear coordinates, but have 
not placed much focus on transformations between coordinate systems, and in particular 
on the way in which multidimensional integrals must transform when the coordinate sys- 
tem is changed. To provide formulas that will be useful in spaces with arbitrary numbers 
of dimensions, and with transformations involving coordinate systems that are not orthog- 
onal, we now return to the notion of the Jacobian, introduced but not fully developed in 
Chapter 1. 

As already mentioned in Chapter 1, changes of variables in multiple integrations, say 
from variables x;, x2,... to u,, u2,... requires that we replace the differential dx;dx2... 
with J dujdu2..., where J, called the Jacobian, is the quantity (usually dependent on 
the variables) needed to make these expressions mutually consistent. More specifically, 
we identify dt = J dujdu2... as the “volume” of a region of width du, in u;, du2 in 
u2,..., where the “volume” is to be computed in the x1, x2,...space, treated as Cartesian 
coordinates. 

To obtain a formula for J we start by identifying the displacement (in the Cartesian 
system defined by the x;) that corresponds to a change in each variable u;. Letting ds(u;) 
be that displacement (which is a vector), we can decompose it into Cartesian components 


as follows: 

a n a 7 

ds(u) =| (= Ja + (= Jet--- [aun 
uy uy 

deus =| ( a+ (= 

s(u2) = | | — — 

3 0u2 Ou2 

0 

ds(u3) = (5) é 
0u3 


ae 


i++ dun (4.72) 


+ 
—~ 
Qo| @ 
=|3 
wo lS 
Se 
@m 
N 
+ 
| 
Q 
= 
na 


The partial derivatives (0.x; /du;) in Eq. (4.72) must be understood to be evaluated with 
the other uz held constant. It would clutter the formula an unreasonable amount to indicate 
this explicitly. 

If we had only two variables, u; and uz, the differential area would simply be |ds(u1)| 
times the component of ds(u2) that is perpendicular to ds(u,). If there were a third vari- 
able, u3, we would further multiply by the component of ds(u3) that was perpendicular to 
both ds(u 1) and ds(u2). Extension to arbitrary numbers of dimensions is obvious. 

What is less obvious is an explicit formula for the “volume” for an arbitrary number of 
dimensions. Let’s start by writing Eq. (4.72) in matrix form: 








ds(uy) Ox] 0x2 0X3 

du du; dur au, |g 

dsuz) | | x1 ax dx | ( 

du2 = du2 OUu2 OUu2 és : (4.73) 
ds(u3) Ox, OxX2 0X3 





du3 du3 Ou3 du3 
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We now proceed to make changes to the second and succeeding rows of the square matrix 
in Eq. (4.73) that may destroy the relation to the ds(u;)/du;, but which will leave the 
“volume” unchanged. In particular, we subtract from the second row of the derivative 
matrix that multiple of the first row which will cause the first element of the modified 
second row to vanish. This will not change the “volume” because it modifies ds(u2)/du2 
by adding or subtracting a vector in the ds(u;)/du, direction, and therefore does not affect 
the component of ds(u2)/du2 perpendicular to ds(u;)/duy. See Fig. 4.1. 

The alert reader will recall that this modification of the second row of our matrix is an 
operation that was used when evaluating determinants, and was there justified because it 
did not change the value of the determinant. We have a similar situation here; the operation 
will not change the value of the differential “volume” because we are changing only the 
component of ds(u2)/duz that is in the ds(u;)/du, direction. In a similar fashion, we 
can carry out further operations of the same kind that will lead to a matrix in which all 
the elements below the principal diagonal have been reduced to zero. The situation at this 
point is indicated schematically for an 4-D space as the transition from the first to the 
second matrix in Fig. 4.2. These modified ds(u;)/du; will lead to the same differential 
volume as the original ds(u;)/du;. This modified matrix will no longer provide a faithful 
representation of the differential region in the u; space, but that is irrelevant since our only 
objective is to evaluate the differential “volume.” 

We next take the final (nth) row of our modified matrix, which will be entirely zero 
except for its last element, and subtract a suitable multiple of it from all the other rows 
to introduce zeros in the last element of every row above the principal diagonal. These 
operations correspond to changes in which we modify only the components of the other 
ds(u;)/du; that are in the direction of ds(u,,), and therefore will not change the differential 
“volume.” Then, using the next-to-last row (which now has only a diagonal element), we 
can in a similar fashion introduce zeros in the next-to-last column of all the preceding 
rows. Continuing this process, we will ultimately have a set of modified ds(u;)/du; that 
will have the structure shown as the last matrix in Fig. 4.2. Because our modified matrix is 
diagonal, with each nonzero element associated with a single different é;, the “volume” is 








FiGuRE 4.1. Area remains unchanged when vector proportional to u, is added to up. 


ait a\2 a3. «44 a a2 «443° «O44 ai 0 OO O 
a2, 422. «493, 2g O bo b23 bag 0 by O O 
> => 
431 432, 433, «34 0 0 b33 b34 0 O b3 O 
a4, 442 443° 44 0 O O- bag 0 O OO bag 


FiGurRE 4.2 Manipulation of Jacobian matrix. Here a;; = (0x; /du;), and b;; are formed 
by combining rows (see text). 
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then easily computed as the product of the diagonal elements. This product of the diagonal 
elements of a diagonal matrix is an evaluation of its determinant. 

Reviewing what we have done, we see that we have identified the differential “volume” 
as a quantity which is equal to the determinant of the original derivative set. This must be 
so, because we obtained our final result by carrying out operations each of which leaves a 
determinant unchanged. The final result can be expressed as the well-known formula for 
the Jacobian: 


0x1 0X2 0X3 
du; du, dur 
0x1 0x2 0X3 
dt=Jdujdu2..., J= Quy duz dur 
0x1 0X2 0X3 
du3 du3 dus 


ee O(X1,X2,...) 


= P 4.74 
O(u1, U2,...) ( ) 








The standard notation for the Jacobian, shown as the last member of Eq. (4.74), is a conve- 
nient reminder of the way in which the partial derivatives appear in it. Note also that when 
the standard notation for J is inserted in the expression for dt, the overall expression has 
du,du2... in the numerator, while 0(u1, v2, ...) appears in the denominator. This feature 
can help the user to make a proper identification of the Jacobian. 

A few words about nomenclature: The matrix in Eq. (4.73) is sometimes called the 
Jacobian matrix, with the determinant in Eq. (4.74) then distinguished by calling it 
the Jacobian determinant. Unless within a discussion in which both these quantities 
appear and need to be separately identified, most authors simply call J, the determinant in 
Eq. (4.74), the Jacobian. That is the usage we follow in this book. 

We close with one final observation. Since J is a determinant, it will have a sign that 
depends on the order in which the x; and u; are specified. This ambiguity corresponds 
to our freedom to choose either right- or left-handed coordinates. In typical applications 
involving a Jacobian, it is usual to take its absolute value and to choose the ranges of the 
individual u; integrals in a way that gives the correct sign for the overall integral. 


Example 4.4.1 2-D and 3-D JacosiANs 


In two dimensions, with Cartesian coordinates x, y and transformed coordinates u, v, the 
element of area dA has, following Eq. (4.74), the form 


taxauavl (=) (52) - (5) (se): 


This is the expected result, as the quantity in square brackets is the formula for the z 
component of the cross product of the two vectors 


ax\. dy\. aox\, dy\. 
(5) + (Fr) oo (F)e+ (5) ” 


and it is well known that the magnitude of the cross product of two vectors is a measure of 
the area of the parallelogram with sides formed by the vectors. 
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In three dimensions, the determinant in the Jacobian corresponds exactly with the for- 
mula for the scalar triple product, Eq. (3.12). Letting A,, Ay, Az in that formula refer to the 
derivatives (0x /du), (dy/du), (dz/du), with the components of B and C similarly related 
to derivatives with respect to v and w, we recover the formula for the volume within the 
parallelepiped defined by three vectors. a 


Inverse of Jacobian 


Since the x; and the u; are arbitrary sets of coordinates, we could have carried out the 
entire analysis of the preceding subsection regarding the u; as the fundamental coordinate 
system, with the x; as coordinates reached by a change of variables. In that case, our 
Jacobian (which we choose to label J~!), would be 


_;  0(u1,u2,...) 
jee 4.75 
0(X1, X2,..-) Pe 
It is clear that if dxjdx2... = J du,du2..., then it must also be true that dujdu2...= 


(1/J) dx,dx2.... Let’s verify that the quantity we have called J~! is in fact 1/J. 
Let’s represent the two Jacobian matrices involved here as 


0X1 0X2 0X3 du, OUu2 U3 

0X1 0X2 0X3 Ou, OU2 U3 
A=| duz duz du. |, B= | dx2 x2 Ox. 

0x1 0x2 0x3 du, OUu2 OUu3 


du3 Ouz du3 x3 8x3 x3 


We then have J = det(A) and J~! = det(B). We would like to show that J J~! = 
det(A) det(B) = 1. The proof is fairly simple if we use the determinant product theorem. 
Thus, we write 


det(A) det(B) = det(AB), 


and now all we need show is that the matrix product AB is a unit matrix. Carrying out the 
matrix multiplication, we find, as a result of the chain rule, 


0 Ou; Ou; 
roveE(E)(C)=(B)=m am 


k 





verifying that AB is indeed a unit matrix. 

The relation between the Jacobian and its inverse is of practical interest. It may turn out 
that the derivatives du; /dx; are easier to compute than 0x; /du;, making it convenient to 
obtain J by first constructing and evaluating the determinant for J~!. 
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DIRECT AND INVERSE APPROACHES TO JACOBIAN 


Example 4.4.2 
0 ’ 6, . . 
(7,9, 9) , where x, y, and z are Cartesian coordinates and 


Suppose we need the Jacobian a 
x ’ ’ 
r, 0, y are spherical polar coordinates. Using Eq. (4.74) and the relations 


r= /x2+y2+22, @=cos! cn ee) g=tan' (=), 
Vx2+y24+22 x 


we find after significant effort (letting p? = x? + y), 





Xx y Zz 
ror or 
a(r, 6, g) i ee a 1 1 
~ O(x,y,z) | 70 1p rl) py 2 sind” 
- ae. 
p- p2 





It is much less effort to use the relations 


x=rsinOcosg, y=rsinésing, z=rcosd, 


and then to evaluate (easily), 


sindcosg sin@sing cosé 


0 ? ’ 1 i 
a ED rcos@ cosy rcosé@ sing —r sind | =r? sind. 
d(r, 8, p) —rsinOsingrsin@cosg 0 


We finish by writing J = 1/J~! =1/r? sin@. 





Exercises 
4.4.1 Assuming the functions u and v to be differentiable, 
(a) Show that a necessary and sufficient condition that u(x, y,z) and u(x, y, z) are 
related by some function f(u, v) = 0 is that (Vu) x (Vv) =0; 
(b) Ifw=u(x, y) andv=v(, y), show that the condition (Vu) x (Vv) = 0 leads to 
the 2-D Jacobian 
du ou 
_ d(u, v) _ ax dy i 
~ a(x,y) | dv avi 
ax dy 
4.4.2 A 2-D orthogonal system is described by the coordinates q; and g2. Show that the 


Jacobian J satisfies the equation 
0 Xx, 0 0 0 0 
( y) x y x Oy h ; ie 


~ 0(q1,92) —9q1 9q2 ~~: gz Ogi 


Hint. It’s easier to work with the square of each side of this equation. 
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For the transformation u = x + y, v=x/y, with x > 0 and y > 0, find the Jacobian 
d(x, y) 
d(u, v) 





(a) By direct computation, 


(b) By first computing J~!. 


4.5 DIFFERENTIAL FORMS 


Our study of tensors has indicated that significant complications arise when we leave Carte- 
sian coordinate systems, even in traditional contexts such as the introduction of spherical 
or cylindrical coordinates. Much of the difficulty arises from the fact that the metric (as 
expressed in a coordinate system) becomes position-dependent, and that the lines or sur- 
faces of constant coordinate values become curved. Many of the most vexing problems can 
be avoided if we work in a geometry that deals with infinitesimal displacements, because 
the situations of most importance in physics then become locally similar to the simpler and 
more familiar conditions based on Cartesian coordinates. 

The calculus of differential forms, of which the leading developer was Elie Cartan, has 
become recognized as a natural and very powerful tool for the treatment of curved coordi- 
nates, both in classical settings and in contemporary studies of curved space-time. Cartan’s 
calculus leads to a remarkable unification of concepts and theorems of vector analysis that 
is worth pursuing, with the result that in differential geometry and in theoretical physics 
the use of differential forms is now widespread. 

Differential forms provide an important entry to the role of geometry in physics, and the 
connectivity of the spaces under discussion (technically, referred to as their topology) has 
physical implications. Illustrations are provided already by situations as simple as the fact 
that a coordinate defined on a circle cannot be single-valued and continuous at all angles. 
More sophisticated consequences of topology in physics, largely beyond the scope of the 
present text, include gauge transformations, flux quantization, the Bohm-Aharanov effect, 
emerging theories of elementary particles, and phenomena of general relativity. 


Introduction 


For simplicity we begin our discussion of differential forms in a notation appropriate for 
ordinary 3-D space, though the real power of the methods under study is that they are 
not limited either by the dimensionality of the space or by its metric properties (and are 
therefore also relevant to the curved space-time of general relativity). The basic quantities 
under consideration are the differentials dx, dy, dz (identified with linearly independent 
directions in the space), linear combinations thereof, and more complicated quantities built 
from these by combination rules we will shortly discuss in detail. Taking for example dx, 
it is essential to understand that in our current context it is not just an infinitesimal number 
describing a change in the x coordinate, but is to be viewed as a mathematical object with 
certain operational properties (which, admittedly, may include its eventual use in contexts 
such as the evaluation of line, surface, or volume integrals). The rules by which dx and 
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related quantities can be manipulated have been designed to permit expressions such as 
w= A(x, y,z)dx + B(x, y,z)dy + C(x, y,z)dz, (4.77) 


which are called 1-forms, to be related to quantities that occur as the integrands of line 
integrals, to permit expressions of the type 


@= F(x, y,z)dx \dy+ G(x, y,z)dx Adz+ H(x, y,z)dy Adz, (4.78) 


which are called 2-forms, to be related to the integrands of surface integrals, and to permit 
expressions like 


@= K(x, y,z)dx Ady Adz, (4.79) 


known as 3-forms, to be related to the integrands of volume integrals. 

The A symbol (called “wedge’’) indicates that the individual differentials are to be com- 
bined to form more complicated objects using the rules of exterior algebra (sometimes 
called Grassmann algebra), so more is being implied by Eqs. (4.77) to (4.79) than the 
somewhat similar formulas that might appear in the conventional notation for various kinds 
of integrals. To maintain contact with other presentations on differential forms, we note 
that some authors omit the wedge symbol, thereby assuming that the reader knows that 
the differentials are to be combined according to the rules of exterior algebra. In order 
to minimize potential confusion, we will continue to write the wedge symbol for these 
combinations of differentials (which are called exterior, or wedge products). 

To write differential forms in ways that do not presuppose the dimension of the under- 
lying space, we sometimes write the differentials as dx;, designating a form as a p-form 
if it contains p factors dx;. Ordinary functions (containing no dx;) can be identified as 
0-forms. 

The mathematics of differential forms was developed with the aim of systematizing the 
application of calculus to differentiable manifolds, loosely defined as sets of points that 
can be identified by coordinates that locally vary “smoothly” (meaning that they are differ- 
entiable to whatever degree is needed for analysis).> We are presently focusing attention 
on the differentials that appear in the forms; one could also consider the behavior of the 
coefficients. For example, when we write the 1-form 


w=A,dx+Aydy+ A,dz, 


Ax, Ay, Az will behave under a coordinate transformation like the components of a vec- 
tor, and in the older differential-forms literature the differentials and the coefficients were 
referred to as contravariant and covariant vector components, since these two sets of quan- 
tities must transform in mutually inverse ways under rotations of the coordinate system. 
What is relevant for us at this point is that relationships we develop for differential forms 
can be translated into related relationships for their vector coefficients, yielding not only 
various well-known formulas of vector analysis but also showing how they can be gener- 
alized to spaces of higher dimension. 





5A manifold defined ona circle or sphere must have a coordinate that cannot be globally smooth (in the usual coordinate systems 
it will jump somewhere by 277). This and related issues connect topology and physics, and are for the most part outside the scope 
of this text. 
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The central idea in exterior algebra is that the operations are designed to create permuta- 
tional antisymmetry. Assuming the basis 1-forms are dx;, that w; are arbitrary p-forms 
(of respective orders p;), and that a and b are ordinary numbers or functions, the wedge 
product is defined to have the properties 


(aa, + bar) \a3=a01 \03+b02A03 (pi = p2), 
(@1 A @2)A @3=@1 A (@2A@3), d(@1 A@2) = (da) A @2, (4.80) 
dx; \ dx; = —dx; \ dx;. 


We thus have the usual associative and distributive laws, and each term of an arbitrary 
differential form can be reduced to a coefficient multiplying a dx; or a wedge product of 
the generic form 


dx; Ndxj \-+»NdXp. 


Moreover, the properties in Eq. (4.80) permit all the coefficient functions to be collected at 
the beginning of a form. For example, 


adx, \ bdx2 = —a(bdx2 A dx\) = —ab(dx2 A dx,) = ab(dx, A dx2). 


We therefore generally do not need parentheses to indicate the order in which products are 
to be carried out. 

We can use the last of Eqs. (4.80) to bring the index set into any desired order. If any two 
of the dx; are the same, the expression will vanish because dx; A dx; = —dx; A dx; = 0; 
otherwise, the ordered-index form will have a sign determined by the parity of the index 
permutation needed to obtain the ordering. It is not a coincidence that this is the sign rule 
for the terms of a determinant, compare Eq. (2.10). Letting ep stand for the Levi-Civita 
symbol for the permutation to ascending index order, an arbitrary wedge product of dx; 
can, for example, be brought to the form 


Ep dXp, \dXpy \++- NaXh,, 1<h, <hz2 <---<hp. 


If any of the dx; in a differential form is linearly dependent on the others, then its 
expansion into linearly independent terms will produce a duplicated dx; and cause the 
form to vanish. Since the number of linearly independent dx; cannot be larger than the 
dimension of the underlying space, we see that in a space of dimension d we only need to 
consider p-forms with p < d. Thus, in 3-D space, only up through 3-forms are relevant; 
for Minkowski space (ct, x, y, z), we will also have 4-forms. 


Example 4.5.1 SimpuiFviNG DIFFERENTIAL FORMS 


Consider the wedge product 
@ = (3dx + 4dy — dz) A (dx —dy +2dz) =3dx Adx —3dx Ady+6dx Adz 
+4dy \dx —4dy \dy+8dy A dz—dzAdx+dzAdy—2dzA dz. 
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The terms with duplicate differentials, e.g., dx A dx, vanish, and products that differ only 
in the order of the 1-forms can be combined, changing the sign of the product when we 
interchange its factors. We get 


w=—Tdx \dy+7dx Adz+7dy \dz=7(dy \dz—dzAdx —dx Ady). 


We will shortly see that in three dimensions there are some advantages to bringing the 
1-forms into cyclic order (rather than ascending or descending order) in the wedge prod- 
ucts, and we did so in the final simplification of w. | 


The antisymmetry built into the exterior algebra has an important purpose: It causes 
p-forms to depend on the differentials in ways appropriate (in three dimensions) for 
the description of elements of length, area, and volume, in part because the fact that 
dx; \ dx; = 0 prevents the appearance of duplicated differentials. In particular, 1-forms 
can be associated with elements of length, 2-forms with area, and 3-forms with volume. 
This feature carries forward to spaces of arbitrary dimensionality, thereby resolving poten- 
tially difficult questions that would otherwise have to be handled on a case-by-case basis. 
In fact, one of the virtues of the differential-forms approach is that there now exists a con- 
siderable body of general mathematical results that is pretty much completely absent from 
tensor analysis. For example, we will shortly find that the rules for differentiation in the 
exterior algebra cause the derivative of a p-form to be a (p + 1)-form, thereby avoiding 
a pitfall that arises in tensor calculus: When the transformation coefficients are position- 
dependent, simply differentiating the coefficients representing a tensor of rank p does not 
yield another tensor. As we have seen, this dilemma is resolved in tensor analysis by intro- 
ducing the notion of covariant derivative. Another consequence of the antisymmetry is 
that lengths, areas, volumes, and (at higher dimensionality) hypervolumes are oriented 
(meaning that they have signs that depend on the way the p-forms defining them are writ- 
ten), and the orientation must be taken into account when making computations based on 
differential forms. 


Complementary Differential Forms 


Associated with each differential form is a complementary (or dual) form that contains the 
differentials not included in the original form. Thus, if our underlying space has dimension 
d, the form dual to a p-form will be a (d— p)-form. In three dimensions, the complement 
to a 1-form will be a 2-form (and vice versa), while the complement to a 3-form will be 
a 0-form (a scalar). It is useful to work with these complementary forms, and this is done 
by introducing an operator known as the Hodge operator; it is usually designated nota- 
tionally as an asterisk (preceding the quantity to which it is applied, not as a superscript), 
and is therefore also referred to either as the Hodge star operator or simply as the star 
operator. Formally, its definition requires the introduction of a metric and the selection 
of an orientation (chosen by specifying the standard order of the differentials comprising 
the 1-form basis), and if the 1-form basis is not orthogonal there result complications we 
shall not discuss. For orthogonal bases, the dual forms depend on the index positions of 
the factors and on the metric tensor.° 





6In the current discussion, restricted to Euclidean and Minkowski metrics, the metric tensor is diagonal, with diagonal elements 
+1, and the relevant quantities are the signs of the diagonal elements. 
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To find *w, where w is a p-form, we start by writing the wedge product w’ of all mem- 
bers of the 1-form basis not represented in w, with the sign corresponding to the permuta- 
tion that is needed to bring the index set 


(indices of w) followed by (indices of w’) 


to standard order. Then *q consists of w’ (with the sign we just found), but also multi- 
plied by (—1)“, where yu is the number of differentials in w’ whose metric-tensor diagonal 
element is —1. For IR3, ordinary 3-D space, the metric tensor is a unit matrix, so this final 
multiplication can be omitted, but it becomes relevant for our other case of current interest, 
the Minkowski metric. 

For Euclidean 3-D space, we have 


*1 = dx, A dx2 A dx3, 


*dX] = dx2 cx dx3, *dx2 = dx3 A dx, *AX3 = dx\ / dx2, 
(4.81) 
*(dx, Adx2)=dx3,  *(dx3 Adx,)=dx2, *(dx2 Adx3) =dx, 


*(dx, A dx2 Adx3)=1. 


Cases not shown above are linearly dependent on those that were shown and can be 
obtained by permuting the differentials in the above formulas and taking the resulting sign 
changes into account. 

At this point, two observations are in order. First, note that by writing the indices 1, 2, 3 
in cyclic order, we have caused all the starred quantities to have positive signs. This 
choice makes the symmetry more evident. Second, it can be seen that all the formulas 
in Eq. (4.81) are consistent with *«(«w) = w. However, this is not universally true; com- 
pare with the formulas for Minkowski space, which are in the example we next consider. 
See also Exercise 4.5.1. 


Example 4.5.2 HODGE OPERATOR IN MINKOWSKI SPACE 


Taking the oriented 1-form basis (dt, dx,,dx2, dx3), and the metric tensor 


1 0 oO O 
0 -l 0 O 
0 oO -l 0]? 
0 oOo oO -!I 


let’s determine the effect of the Hodge operator on the various possible differential forms. 
Consider initially *1, for which the complementary form contains dt A dx, A dx2 A dx3. 
Since we took these differentials in the basis order, they are assigned a plus sign. Since w = 
1 contains no differentials, its number jx of negative metric-tensor diagonal elements is 
zero, so (—1) = (—1)° = 1 and there is no sign change arising from the metric. Therefore, 


*l=dt Adx, A dx2 A dx3. 


Next, take «(dt A dx; A dx2 A dx3). The complementary form is just unity, with no 
sign change due to the index ordering, as the differentials are already in standard order. 
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However, this time we have three entries in the quantity being starred with negative metric- 
tensor diagonal elements; this generates (—1)3 =—1, so 
*(dt Adx,; A dx2 A dx3)=—-1. 


Moving next to «dx, the complementary form is dt A dx2 A dx3, and the index order- 
ing (based on dx,, dt, dx2, dx3) requires one pair interchange to reach the standard order 
(thereby yielding a minus sign). But the quantity being starred contains one differential 
that generates a minus sign, namely dx, so 


*dx, = dt A dx2 \ dx3. 


Looking explicitly at one more case, consider «(dt A dx1), for which the complementary 
form is dx2 A dx3. This time the indices are in standard order, but the dx; being starred 
generates a minus sign, so 


*(dt A dx,) = —dx2 A dx3. 


Development of the remaining possibilities is left to Exercise 4.5.1; the results are summa- 
rized below, where i, j, k denotes any cyclic permutation of 1,2,3. 


*l=dt A dx, A dx2 A dx3, 
*dxj =dt Adxj \dxx, *dt =dx, AN dx2 Adx3, 
*(dx; Adxy)=dt Ndx;,  *(dt Adx;) = —dx; Adxx, (4.82) 
*(dx) Adx2 \dx3)=dt, (dt Adx; Adxj) =dxx, 
*(dt Adx, A dx2 Adx3)=—1. 


Note that all the starred forms in Eq. (4.82) with an even number of differentials have 
the property that «(*«w) = —w, confirming our earlier statement that complementing twice 
does not always restore the original form with its original sign. | 


We now consider some examples illustrating the utility of the star operator. 


Example 4. 5.3 MISCELLANEOUS DIFFERENTIAL FORMS 


In the Euclidean space R?, consider the wedge product A A B of the two 1-forms A = 
A, dx + Aydy + A,dz and B = B, dx + Bydy + B, dz. Simplifying using the rules for 
exterior products, 


AA B=(AyB, — A, By) dy \dz+ (AzBy — Ax Bz) dz A dx + (Ax By — Ay By) dx Ady. 
If we now apply the star operator and use the formulas in Eq. (4.81) we get 
*(A A B) = (Ay B, — Az By) dx + (A; By — Ax Bz) dy + (Ax By — Ay Bx) dz, 


showing that in R?, *(A A B) forms an expression that is analogous to the cross product 
A x B of vectors Ay é€, + Ayéy + Az€, and Byé, + Byé, + B,é,. In fact, we can write 


* (AA B)= (A x B), dx + (A x B)y dy + (A x B), dz. (4.83) 





238 Chapter 4 Tensors and Differential Forms 
Note that the sign of «(A A B) is determined by our implicit choice that the standard 
ordering of the basis differentials is (dx, dy, dz). 

Next, consider the exterior product A A B A C, where C is a 1-form with coefficients 
Cx, Cy, Cz. Applying the evaluation rules, we find that every surviving term in the product 
is proportional to dx A dy A dz, and we obtain 

AA BAC =(A,ByC,z — Ay B,Cy — Ay B,C, 
+ AyB,Cy + A, B,Cy — Az; ByCx) dx A dy A dz, 
which we recognize can be written in the form 
Ax Ay A; 
ANBAC=| By By Bz| dx Ady A dz. 
C, Cy Cz 
Applying now the star operator, we reach 
Ax Ay Az 
*(AA BAC)=| By By B,| =A- (Bx C). (4.84) 
Cy Cy C; 

Not only were the results in Eqs. (4.83) and (4.84) easily obtained, they also generalize 
nicely to spaces of arbitrary dimension and metric, while the traditional vector notation, 
which uses the cross product, is applicable only to R*. | 

Exercises 
4.5.1 Using the rules for the application of the Hodge star operator, verify the results given in 

Eq. (4.82) for its application to all linearly independent differential forms in Minkowski 

space. 

4.5.2 If the force field is constant and moving a particle from the origin to (3, 0, 0) requires a 


units of work, from (—1, —1,0) to (—1, 1, 0) takes b units of work, and from (0, 0, 4) 
to (0, 0,5) c units of work, find the 1-form of the work. 


4.6 DIFFERENTIATING FORMS 


Exterior Derivatives 


Having introduced differential forms and their exterior algebra, we next develop their prop- 
erties under differentiation. To accomplish this, we define the exterior derivative, which 
we consider to be an operator identified by the traditional symbol d. We have, in fact, 
already introduced that operator when we wrote dx;, stating at the time that we intended 
to interpret dx; as a mathematical object with specified properties and not just as a small 
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change in x;. We are now refining that statement to interpret dx; as the result of applying 
the operator d to the quantity x;. We complete our definition of the operator d by requir- 
ing it to have the following properties, where w is a p-form, w’ is a p’-form, and f is an 
ordinary function (a 0-form): 


d(o+a')=do+da’ (p=p’), 
d(f@)=(df)A@+ fda, 


d(wAa')=doAa +(-1)? wAda’, (4.85) 
d(dw) = 0, 
of 
df= —d 
f 2 Ox; Aye 


where the sum over j spans the underlying space. The formula for the derivative of the 
wedge product is sometime called by mathematicians an antiderivation, referring to the 
fact that when applied to the right-hand factor an antisymmetry-motivated minus sign 
appears. 


Example 4.6.1 EXTERIOR DERIVATIVE 


Equations (4.85) are axioms, so they are not subject to proof, though they are required 
to be consistent. It is of interest to verify that the sign for the derivative of the second 
term in a wedge product is needed. Taking w and w’ to be monomials, we first bring their 
coefficients to the left and then apply the differentiation operator (which, irrespective of 
the choice of sign, gives zero when applied to any of the differentials). Thus, 


d(w Ao!) =d(AB)| dxy A+ Adxp| A| dx A+ Adxpr| 


>> Erma hae Gace A [dx r--Adxy]. 
Lu 


On expanding the sum, the first term is clearly dw A w’; to make the second term look like 
w A da’, it is necessary to permute dx, through the p differentials in w, yielding the sign 
factor (—1)?. Extension to general polynomial forms is trivial. 

One might also ask whether the fourth of the above axioms, d(dw) = 0, sometimes 
referred to as Poincaré’s lemma, is necessary or consistent with the others. First, it pro- 
vides new information, as otherwise we have no way of reducing d(dx;). Next, to see why 
the axiom set is consistent, we illustrate by examining (in R7) 


of of 
df =—-dx+—dy, 
f ay aa y 
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from which we form 


d(df) = ~ (+) dx Adx + > (+) dy \dx 


0 (0 0 (90 
a eae Hh) ax ndy+ = (21) dy ndy=o. 
ox \ dy dy \ dy 


We obtain the zero result because of the antisymmetry of the wedge product and because 
the mixed second derivatives are equal. We see that the central reason for the validity of 
Poincaré’s lemma is that the mixed derivatives of a sufficiently differentiable function are 
invariant with respect to the order in which the differentiations are carried out. a 


To catalog the possibilities for the action of the d operator in ordinary 3-D space, we 
first note that the derivative of an ordinary function (a 0-form) is 
af af af 


I ae ag gg re et Pee (4.86) 





We next differentiate the 1-form w = A, dx + Ay dy + A, dz. After simplification, 


dA,  dAy 


do=|— + — lay and 
= E se |e e+| 


0A 0A 0A 0A 
sl dzndx+|— - — 
Oz ox ox oy 








dx Ady. 
We recognize this as 


d(A,y dx + Aydy + A,dz)= 
(V x A), dy Adz+(V x A)ydzAdx+(V x A),dx Ady, (4.87) 


which is equivalent to 
#d (Ay dx + Ay dy + A,dz) =(V x A)ydx +(V x A)ydy+(V x A)zdz. (4.88) 


Finally we differentiate the 2-form B, dy A dz+ By dz A dx + B,dx A dy, obtaining 
the three-form 
OB, OB, OB, 


dx Ady A dz, 
Be Oe a ve 





d(Bydy Adz+ By de Adx+ Beds Ady) =| 
equivalent to 
d (B, dy \dz+ BydzAdx + B,dx A dy) =(V-B)dx Ady Adz (4.89) 


and 


+d (By dy Adz + By dz \dx + B,dx Ady) =V -B. (4.90) 


We see that application of the d operator directly generates all the differential operators of 
traditional vector analysis. 
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If now we return to Eq. (4.87) and take the 1-form on its left-hand side to be df, so that 
A= Vf, we have, inserting Eq. (4.86), 


d(df)= (v x (Wf) dy Adz+ (v x (Wf) dz Adx + (v x (Ws) dx Ady=0. 
(4.91) 


We have invoked Poincaré’s lemma to set this expression to zero. The result is equivalent 
to the well-known identity V x (Vf) = 0. 

Another identity is obtained if we start from Eq. (4.89) and take the 2-form on its left- 
hand side to be d(A, dx + Ay dy + A, dz). Then, with the aid of Eq. (4.88), we have 


d( d(Ax dx + Aydy +A; dz)) =V-(V x A)dx Ady Adz=0, (4.92) 


where once again the zero result follows from Poincarée’s lemma and we have established 
the well-known formula V - (V x A) = 0. Part of the importance of the derivation of these 
formulas using differential-forms methods is that these are merely the first members of 
hierarchies of identities that can be derived for spaces with higher numbers of dimensions 
and with different metric properties. 


Example 4.6.2 — Maxwelvs EQUATIONS 


Maxwell’s equations of electromagnetic theory can be written in an extremely compact and 
elegant way using differential forms notation. In that notation, the independent elements of 
the electromagnetic field tensor can be written as the coefficients of a 2-form in Minkowski 
space with oriented basis (dt, dx, dy, dz): 


F=—E, dt \dx — Eydt \dy— E,dt Adz 
+ By, dy \dz+ Bydz A dx + B,dx Ady. (4.93) 


Here E and B are respectively the electric field and the magnetic induction. The sources of 
the field, namely the charge density p and the components of the current density J, become 
the coefficients of the 3-form 


J = pdx Ady \dz— J, dt Ndy Ndz— Jydt \dz \dx — J,dt Ndx Ndy. (4.94) 


For simplicity we work in units with the permitivity, magnetic permeability, and velocity 
of light all set to unity (e = ~ = c = 1). Note that it is natural that the charge and current 
densities occur in a 3-form; although they have together the number of components needed 
to constitute a four-vector, they are of dimension inverse volume. Note also that some of 
the signs in the formulas of this example depend on the details of the metric, and are chosen 
to be correct for the Minkowski metric as given in Example 4.5.2. This Minkowski metric 
has signature (1,3), meaning that it has one positive and three negative diagonal elements. 
Some workers define the Minkowski metric to have signature (3,1), reversing all its signs. 
Either choice will give correct results to problems of physics if used consistently; trouble 
only arises if material from inconsistent sources is combined. 

The two homogeneous Maxwell equations are obtained from the simple formula 
dF = 0. This equation is not a mathematical requirement on F; it is a statement of the 
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physical properties of electric and magnetic fields. To relate our new formula to the more 
usual vector equations, we simply apply the d operator to F: 




















dE, OE, Ey Ey 

dF= dy+ dz| Adt \dx dx + ——dz| Adt Ady 
dy 0z ax Oz 
JE, JE, OB, OB, 

dx + dy| Adt \dz+ dt+ dx| Ady Adz 
ox oy ot Ox 
OBy OBy OB, OB, 

+ | — dt + —dy|AdzAdx+ = dt + —dz|AdxAdy=0. (4.95) 

ot dy at Oz 


Equation (4.95) is easily simplified to 

OE, OF, OB, E, E By 
dF =| — dt Ady Adz+ Ea age 
az Ox ot 








x 
dy Oz . ot 


dEy OE, OB, OBy 
~|dtAdx Ad 
+|/= me + | * y+ [T+ 


etadeaay 


aBy 
dy 








OB, 
+ dx Ady Adz=0. 
Oz 
(4.96) 


Since the coefficient of each 3-form monomial must individually vanish, we obtain from 
Eq. (4.96) the vector equations 


0B 
VG = and V-B=0. 


We now go on to obtain the two inhomogeneous Maxwell equations from the almost 
equally simple formula d(« F’) = J. To verify this, we first form *«F’, evaluating the starred 
quantities using the formulas in Eqs. (4.82): 


*F = E,dy \dz+ EydzAdx + E,dx Ady + By, dt \dx + By dt Ady + B,dt A dz. 


We now apply the d operator, reaching after steps similar to those taken while obtaining 
Eq. (4.96): 


Ex 


ot 





a 
(uF) =V-Edx Ady Adz +| = (0 xB, [dr nay naz 








dE, JE, 
+ vs —(V x By | dt AdzAdx + vi (V x B,| dt Adx Ady. (4.97) 
Setting d(*F) from Eq. (4.97) equal to J as given in Eq. (4.94), we obtain the remaining 
Maxwell equations 
JE 
V-E=p and VxB-—e=4J. 
ot 
We close this example by applying the d operator to J. The result must vanish because 
dJ =d(d(xF)). We get, starting from Eq. (4.94), 
ij= E OJ, OSy OS: 


dt \dx Ady \dz=0, 
att Ox e+e] pe ee 
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showing that 


a 
2 +V-J=0. (4.98) 


Summarizing, the differential-forms approach has reduced Maxwell’s equations to the 
two simple formulas 


dF=0 and d(«F)=J, (4.99) 
and we have also shown that J must satisfy an equation of continuity. a 
Exercises 
4.6.1 Given the two 1-forms w; = x dy + ydx and w2 = x dy — ydx, calculate 
(a) dai, 
(b) dw. 
(c) For each of your answers to (a) or (b) that is nonzero, apply the operator d a second 
time and verify that d(dw;) = 0. 
4.6.2 Apply the operator d twice to w3 = xydz + xzdy — yzdx. Verify that the second 
application of d yields a zero result. 
4.6.3 For w2 and @3 the 1-forms with these names in Exercises 4.6.1 and 4.6.2, evaluate 


d(@2 A w3): 


(a) By forming the exterior product and then differentiating, and 


(b) Using the formula for differentiating a product of two forms. 


Verify that both approaches give the same result. 


4.7 INTEGRATING FORMS 


It is natural to define the integrals of differential forms in a way that preserves our usual 
notions of integration. The integrals with which we are concerned are over regions of the 
manifolds on which our differential forms are defined; this fact and the antisymmetry of 
the wedge product need to be taken into account in developing definitions and properties 
of integrals. For convenience, we illustrate in two or three dimensions; the notions extend 
to spaces of arbitrary dimensionality. 

Consider first the integral of a 1-form w in 2-D space, integrated over a curve C from a 
start-point P to an endpoint Q: 


Joa [[arar+ aay]. 
CG Cc 


We interpret the integration as a conventional line integral. If the curve is described para- 
metrically by x(t), y(t) as t increases monotonically from fp to tg, our integral takes the 
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elementary form 


tg 
dx dy 
fo=[[aoZ+ao2] dt, 
Cc tp 


and (at least in principle) the integral can be evaluated by the usual methods. 

Sometimes the integral will have a value that will be independent of the path from P to 
Q; in physics this situation arises when a 1-form with coefficients A= (Ax, Ay) describes 
what is known as a conservative force (i.e., one that can be written as the gradient of a 
potential). In our present language, we then call w exact, meaning that there exists some 
function f such that 


w@=df (x,y) (4.100) 


for a region that includes the points P, Q, and all other points through which the path may 
pass. 
To check the significance of Eq. (4.100), note that it implies 


showing that w has as coefficients the components of the gradient of f. Given Eq. (4.100), 
we also see that 


Q 
if o=df, fo=ra@-re. (4.101) 
P 
This admittedly obvious result is independent of the dimension of the space, and is of 


importance to the remainder of this section. 
Looking next at 2-forms, we have (in 2-D space) integrals such as 


Joa [ benax A dy. (4.102) 
S S 


We interpret dx A dy as the element of area corresponding to displacements dx and dy 
in mutually orthogonal directions, so in the usual notation of integral calculus we would 
write dx dy. 

Let’s now return to the wedge product notation and consider what happens if we make 
a change of variables from x, y to u,v, with x = au + bv, y=eu+ fv. Then dx = 
adu+bdv, dy =edu-+ f dv, and 


dx \dy =(adu+bdv) A (edu+ f dv) = (af — be)du Adv. (4.103) 


We note that the coefficient of du A dv is just the Jacobian of the transformation from x, y 
to u, v, which becomes clear if we write a = 0x/du, etc., after which we have 


Ox Ox 
du av ab 
—be= = . 4.104 
af — be ay ay og ( ) 





du Ov 
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We now see a fundamental reason why the wedge product has been introduced; it has the 
algebraic properties needed to generate in a natural fashion the relations between elements 
of area (or its higher-dimension analogs) in different coordinate systems. To emphasize 
that observation, note that the Jacobian occurred as a natural consequence of the transfor- 
mation; we did not have to take additional steps to insert it, and it was generated simply 
by evaluating the relevant differential forms. In addition, the present formulation has one 
new feature: because dx A dy and dy A dx are opposite in sign, areas must be assigned 
algebraic signs, and it is necessary to retain the sign of the Jacobian if we make a change 
of variables. We therefore take as the element of area corresponding to dx A dy the ordi- 
nary product +dxdy, with a choice of sign known as the orientation of the area. Then, 
Eq. (4.102) becomes 





[os [ ee.ncaxay, (4.105) 
Ss S 


and if elsewhere in the same computation we had dy A dx, we must convert it to dxdy 
using the sign opposite to that used for dx A dy. 

For p-forms with p > 2, a corresponding analysis applies: If we transform from 
(x, y,--.-) to (u, v,...), the wedge product dx Ady A--- becomes J du Adu A.---, where 
J is the (signed) Jacobian of the transformation. Since the p-space volumes are oriented, 
the sign of the Jacobian is relevant and must be retained. Exercise 4.7.1 shows that the 
change of variables from the 3-form dx A dy A dz to du Adv A dw yields the determinant 
which is the (signed) Jacobian of the transformation. 


Stokes’ Theorem 


A key result regarding the integration of differential forms is a formula known as Stokes’ 
theorem, a restricted form of which we encountered in our study of vector analysis in 
Chapter 3. Stokes’ theorem, in its simplest form, states that if 


e R isa simply-connected region (i.e., one with no holes) of a p-dimensional differen- 
tiable manifold in a n-dimensional space (n > p); 
e Rhasa boundary denoted dR, of dimension p — 1; 


e qmisa(p-— 1)-form defined on R and its boundary, with derivative dw; 


then 


[w= fo. (4.106) 
R 


OR 


This is the generalization, to p dimensions, of Eq. (4.101). Note that because dw results 
from applying the d operator to w, the differentials in dw consist of all those in @, in 
the same order, but preceded by that produced by the differentiation. This observation is 
relevant for identifying the signs to be associated with the integrations. 
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A rigorous proof of Stokes’ theorem is somewhat complicated, but an indication of its 
validity is not too involved. It is sufficient to consider the case that w is a monomial: 


0A 
@=A(X1,..-,Xp)dx2A+++dXp, dw= res A dxy++-dXp. (4.107) 


We start by approximating the portion of R adjacent to the boundary by a set of small 
p-dimensional parallelepipeds whose thickness in the x; direction is 6, with 6 having for 
each parallelepiped the sign that makes x; — x, — 6 in the interior of R. For each such 
parallepiped (symbolically denoted A, with faces of constant x; denoted 0 A), we integrate 
da in x; from x; — 6 to x, and over the full range of the other x;, obtaining 


a age ans Ada A -dXp 


aA x|—5 


= f Acie, dan nveodty = f AGL = 8,325...) daa Ada, (4.108) 
JA JA 


Equation (4.108) indicates the validity of Stokes’ theorem for a laminar region whose 
exterior boundary is 0R; if we perform the same process repeatedly, we can collapse the 
inner boundary to a region of zero volume, thereby reaching Eq. (4.106). 

Stokes’ theorem applies for manifolds of any dimension; different cases of this single 
theorem in two and three dimensions correspond to results originally identified as distinct 
theorems. Some examples follow. 


Example 4.7.1 GREEN’S THEOREM IN THE PLANE 


Consider in a 2-D space the 1-form w and its derivative: 


w= P(x, y)dx + Q(x, y) dy, (4.109) 
aP F dQ aP 

Gi thine deRay= <= dx Ady, (4.110) 
dy Ox ox dy 


where we have without comment discarded terms containing dx A dx or dy Ady. 
We apply Stokes’ theorem for this w to a region S with boundary C, obtaining 


[| 2-Flere- [ears aay. 
ox oy 
S Cc 


With orientation such that dx A dy = dS (ordinary element of area), we have the formula 
usually identified as Green’s theorem in the plane: 


[ (pax+ oar) = { [2-F les. ores 
c S 
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Some cases of this theorem: taking P = 0, Q = x, we have the well-known formula 


[sara fas=a, 
Cc Ss 


where A is the area enclosed by C with the line integral evaluated in the mathematically 
positive (counterclockwise) direction. 
If we take P = y, Q =O, we get instead another familiar formula: 


[vars [cpas=-a, 
S 


Cc 
a 


When working Example 4.7.1, we assumed (without comment) that the line integral on 
the closed curve C was to be evaluated for travel in the counterclockwise direction, and 
we also related area to the conversion from dx A dy to +dxdy. These are choices that 
were not dictated by the theory of differential forms but by our intention to make its results 
correspond to computation in the usual system of planar Cartesian coordinates. What is 
certainly true is that the differential forms calculus gives a different sign for the integral 
of y dx than it gave for the integral of x dy; the user of the calculus has the responsibility 
to make definitions corresponding to the situation for which the results are claimed to be 
relevant. 


Example 4.7.2 STOKES’ THEOREM (USUAL 3-D CASE) 


Let the vector potential A be represented by the differential form w, with it and its deriva- 
tive of the forms 











w= Aydx + Aydy + Azdz, (4.112) 
day -oAG dA, dA OA, Ay 
dw = f= dyad fa dead es ee Fhe 
i E =| 2 7+ |S pp ae oe 
=(V x A) dy Adz+(V x A)ydz A dx +(V x A), dx Ady. (4.113) 


Applying Stokes’ theorem to a region S with boundary C and noting that if the standard 
order for orienting the differentials is dx, dy, dz, then dy A dz > do,, dz \dz— doy, 
dx \ dy — do;, and Stokes’ theorem takes the familiar form 


[(Avdc+Aydy+ Acds) = fa-ar= [ov x A)-do. (4.114) 
Cc Cc Ss 
a 


Once again we have results whose interpretation depends on how we have chosen to 
define the quantities involved. The differential forms calculus does not know whether we 
intend to use a right-handed coordinate system, and that choice is implicit in our identifi- 
cation of the elements of area do;. In fact, the mathematics does not even tell us that the 
quantities we identified as components of V x A actually correspond to anything physical 
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in their indicated directions. So, once again, we emphasize that the mathematics of differ- 
ential forms provides a structure appropriate to the physics to which we apply it, but part 
of what the physicist brings to the table is the correlation between mathematical objects 
and the physical quantities they represent. 


Example 4.7.3 Gauss’ THEOREM 


As a final example, consider a 3-D region V with boundary 0V, containing an electric field 
given on OV as the 2-form w, with 





w= E,dy \dz+ EydzAdz+ Ez,dx Ady, (4.115) 
JE dE, dE 
dw = | — + —2 4+ — | dx Ady Adz=(V -E)dx Ady Adz. (4.116) 
ax dy Oz 
For this case, Stokes’ theorem is 
[do [Bax ndynde=[v-Bar= | B-ao, (4.117) 
V V V av 
where dx A dy A dz —> dt and, just as in Example 4.7.2, dy A dz > do, etc. We have 
recovered Gauss’ theorem. | 
Exercises 
4.7.1 Use differential-forms relations to transform the integral A(x, y,z)dx A dy A dz to 


the equivalent expression in du A dv A dw, where u, v, w is a linear transformation of 
x,y,z, and thereby find the determinant that can be identified as the Jacobian of the 
transformation. 


4.7.2 Write Oersted’s law, 


[aears[vxneda~s, 


as S 


in differential form notation. 


dA OB 
4.7.3 A 1-form Adx + Bdy is defined as closed if eee It is called exact if there is a 
y XxX 


af af 


function f such that i A and re B. Determine which of the following 1-forms 
x y 


are closed, or exact, and find the corresponding functions f for those that are exact: 


dx+xd x 
ydx+xdy, gage Unio Lae 
ydx x dy 





page 2a f(z) dz with z =x + iy. 
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Weinberg, S., Gravitation and Cosmology. Principles and Applications of the General Theory of Relativity. New 
York: Wiley (1972). This book and the one by Misner, Thorne, and Wheeler are the two leading texts on 
general relativity and cosmology (with tensors in non-Cartesian space). 
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CHAPTER 5 


VECTOR SPACES 


A large body of physical theory can be cast within the mathematical framework of vector 
spaces. Vector spaces are far more general than vectors in ordinary space, and the analogy 
may to the uninitiated seem somewhat strained. Basically, this subject deals with quantities 
that can be represented by expansions in a series of functions, and includes the methods by 
which such expansions can be generated and used for various purposes. A key aspect of 
the subject is the notion that a more or less arbitrary function can be represented by such 
an expansion, and that the coefficients in these expansions have transformation properties 
similar to those exhibited by vector components in ordinary space. Moreover, operators 
can be introduced to describe the application of various processes to a function, thereby 
converting it (and also the coefficients defining it) into other functions within our vector 
space. The concepts presented in this chapter are crucial to an understanding of quan- 
tum mechanics, to classical systems involving oscillatory motion, transport of material or 
energy, even to fundamental particle theory. Indeed, it is not excessive to claim that vector 
spaces are one of the most fundamental mathematical structures in physical theory. 


5.1 VECTORS IN FUNCTION SPACES 


We now seek to extend the concepts of classical vector analysis (from Chapter 3) to more 
general situations. Suppose that we have a two-dimensional (2-D) space in which the two 
coordinates, which are real (or in the most general case, complex) numbers that we will 
call a; and ap, are, respectively, associated with the two functions g;(s) and ¢2(s). It is 
important at the outset to understand that our new 2-D space has nothing whatsoever to do 
with the physical xy space. It is a space in which the coordinate point (a1, a2) corresponds 
to the function 


f(s) = aigi(s) + a2g2(s). (5.1) 
The analogy with a physical 2-D vector space with vectors A = A, €; + A2€) is that @; (s) 


corresponds to €;, while a; <—> A;, and f(s) <> A. In other words, the coordinate 
251 


Mathematical Methods for Physicists. 
© 2013 Elsevier Inc. All rights reserved. 





252 


Chapter 5 Vector Spaces 


values are the coefficients of the g;(s), so each point in the space identifies a different 
function f(s). Both f and g are shown above as dependent on an independent variable we 
call s. We choose the name s to emphasize the fact that the formulation is not restricted 
to the spatial variables x, y, z, but can be whatever variable, or set of variables, is needed 
for the problem at hand. Note further that the variable s is not a continuous analog of the 
discrete variables x; of an ordinary vector space. It is a parameter reminding the reader that 
the g; that correspond to the dimensions of our vector space are usually not just numbers, 
but are functions of one or more variables. The variable(s) denoted by s may sometimes 
correspond to physical displacements, but that is not always the case. What should be clear 
is that s has nothing to do with the coordinates in our vector space; that is the role of the aj. 

Equation (5.1) defines a set of functions (a function space) that can be built from the 
basis ¢, ~2; we call this space a linear vector space because its members are linear com- 
binations of the basis functions and the addition of its members corresponds to component 
(coefficient) addition. If f(s) is given by Eq. (5.1) and g(s) is given by another linear 
combination of the same basis functions, 


g(s) = b1 91 (s) + bog2(s), 
with b, and b2 the coefficients defining g(s), then 


h(s) = f(s) + g(s) = (a1 + b1)G1(s) + (a2 + b2)¢G2(s) (5.2) 


defines h(s), the member of our space (1.e., the function), which is the sum of the members 
f(s) and g(s). In order for our vector space to be useful, we consider only spaces in which 
the sum of any two members of the space is also a member. 

In addition, the notion of linearity includes the requirement that if f(s) is a member 
of our vector space, then u(s) =k f(s), where k is a real or complex number, is also a 
member, and we can write 


u(s) =k f(s) =ka,gi(s) + kazgo(s). (5.3) 


Vector spaces for which addition of two members or multiplication of a member by scalar 
always produces a result that is also a member are termed closed under these operations. 

We can summarize our findings up to this point as follows: addition of two members 
of our vector space causes the coefficients of the sum, f(s) in Eq. (5.2), to be the sum 
of the coefficients of the addends, namely f(s) and g(s); multiplication of f(s) by a 
ordinary number k (which, by analogy with ordinary vectors, we call a scalar), results 
in the multiplication of the coefficients by k. These are exactly the operations we would 
carry out to form the sum of two ordinary vectors, A + B, or the multiplication of a vector 
by a scalar, as in kA. However, here we have the coefficients a; and b;, which combine 
under vector addition and multiplication by a scalar in exactly the same way that we would 
combine the ordinary vector components A; and Bj. 

The functions that form the basis of our vector space can be ordinary functions, and may 
be as simple as powers of s, or more complicated, as for example yg; = (1 + 3s + 3s7)e’, 
¢2 = (1 — 3s + 3s*)e7%, or compound quantities such as the Pauli matrices o;, or even 
completely abstract quantities that are defined only by certain properties they may possess. 
The number of basis functions (i.e., the dimension of our basis) may be a small number 
such as 2 or 3, a larger but finite integer, or even denumerably infinite (as would arise in an 
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untruncated power series). The main universal restriction on the form of a basis is that the 
basis members be linearly independent, so that any function (member) of our vector space 
will be described by a unique linear combination of the basis functions. We illustrate the 
possibilities with some simple examples. 


Example 5.1.1 Some VecToR SPACES 


1. 


We consider first a vector space of dimension 3, which is spanned by (meaning that it 
has a basis that consists of) the three functions Po(s) = 1, P}(s) =s, P2(s) = 3 5? - 7 


Some members of this vector space include the functions 


1 2 
S+3=3P9(s) + Pils), 8° = 3 Pols) + 3 Pr(s), 4-38 =4Po(s) — 3Pi(s). 
In fact, because we can write 1, s, and s* in terms of our basis, we can see that any 
quadratic form in s will be a member of our vector space, and that our space includes 
only functions of s that can be written in the form cp + cys + cs. 

To illustrate our vector-space operations, we can form 


; 1 2 
5° 28+3)= E Pols) +5 rx] — 2 3Po(s) + Pi(s)] 


= (5 = 6) Po(s) — 2P\(s) + 5 Pas). 


This calculation involves only operations on the coefficients; we do not need to refer 
to the definitions of the P,, to carry it out. 

Note that we are free to define our basis any way we want, so long as its members 

are linearly independent. We could have chosen as our basis for this same vector space 
vo = 1, 1 =5, v2 =”, but we chose not to do so. 
The set of functions g,(s) = 5" (n =0,1,2,...) is a basis for a vector space whose 
members consist of functions that can be represented by a Maclaurin series. To avoid 
difficulties with this infinite-dimensional basis, we will usually need to restrict consid- 
eration to functions and ranges of s for which the Maclaurin series converges. Conver- 
gence and related issues are of great interest in pure mathematics; in physics problems 
we usually proceed in ways such that convergence is assured. 

The members of our vector space will have representations 


ee) 


f(s) =ao +a18 +4787 +---= Yo ans”, 
n=0 


and we can (at least in principle) use the rules for making power series expansions to 
find the coefficients that correspond to a given f(s). 

The spin space of an electron is spanned by a basis that consists of a linearly indepen- 
dent set of possible spin states. It is well known that an electron can have two linearly 
independent spin states, and they are often denoted by the symbols « and 8. One pos- 
sible spin state is f = aja + a2, and another is g = bha +28. We do not even need 
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to know what w and £ really stand for to discuss the 2-D vector space spanned by 
these functions, nor do we need to know the role of any parametric variable such as 
s. We can, however, state that the particular spin state corresponding to f + ig must 
have the form 


f tig = (a1 +ibi)a + (az + ib). 


Scalar Product 


To make the vector space concept useful and parallel to that of vector algebra in ordinary 
space, we need to introduce the concept of a scalar product in our function space. We shall 
write the scalar product of two members of our vector space, f and g, as (f|g). This is the 
notation that is almost universally used in physics; various other notations can be found in 
the mathematics literature; examples include [f, g] and (f, g). 

The scalar product has two main features, the full meaning of which may only become 
clear as we proceed. They are: 


1. The scalar product of a member with itself, e.g., (f|), must evaluate to a numeri- 
cal value (not a function) that plays the role of the square of the magnitude of that 
member, corresponding to the dot product of an ordinary vector with itself, and 

2. The scalar product must be linear in each of the two members.! 


There exists an extremely wide range of possibilities for defining scalar products that 
meet these criteria. The situation that arises most often in physics is that the members 
of our vector space are ordinary functions of the variable s (as in the first vector space 
of Example 5.1.1), and the scalar product of the two members f(s) and g(s) is computed 
as an integral of the type 


b 
(Fle) = / F*(s)e(s) w(s) ds, (5.4) 


with the choice of a, b, and w(s) dependent on the particular definition we wish to adopt 
for our scalar product. In the special case (f|f), the scalar product is to be interpreted as 
the square of a “length,” and this scalar product must therefore be positive for any f that is 
not itself identically zero. Since the integrand in the scalar product is then f*(s) f(s) w(s) 
and f*(s) f(s) = 0 for all s (even if f(s) is complex), we can see that w(s) must be 
positive over the entire range [a, b] except possibly for zeros at isolated points. 

Let’s review some of the implications of Eq. (5.4). It is not appropriate to interpret that 
equation as a continuum analog of the ordinary dot product, with the variable s thought 
of as the continuum limit of an index labeling vector components. The integral actually 
arises pursuant to a decision to compute a “squared length” as a possibly weighted average 
over the range of values of the parameter s. We can illustrate this point by considering the 


'Tf the members of the vector space are complex, this statement will need adjustment; see the formal definitions in the next 


subsection. 
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other situation that arises occasionally in physics, and illustrated by the third vector space 
in Example 5.1.1. Here we simply define the scalar products of the individual w and B to 
have values 


(alo) = (B|B)=1, (|B) = (Bla) =0, 
and then, taking the simple one-electron functions 
f=ayat+ap, g=bia+boB, 
and assuming a; and b; to be real, we expand (f|g) (using its linearity property) to reach 
(f18) = a1b1 (ala) + ayb2(a|B) + arbi (B\w) + arb2(B|B) =aib) +arb2. (5.5) 


These equations show that the introduction of an integral is not an indispensible step toward 
generalization of the scalar product; they also show that the final formula in Eq. (5.5), 
which is analogous to ordinary vector algebra, arises from the expansion of (f|g) ina 
basis whose two members, a and £, are orthogonal (i.e., have a zero scalar product). Thus, 
the analogy to ordinary vector algebra is that the “unit vectors” of this spin system define 
an orthogonal “coordinate system” and that the “dot product” then has the expected form. 

Vector spaces that are closed under addition and multiplication by a scalar and which 
have a scalar product that exists for all pairs of its members are termed Hilbert spaces; 
these are the vector spaces of primary importance in physics. 


Hilbert Space 


Proceeding now somewhat more formally (but still without complete rigor), and includ- 
ing the possibility that our function space may require more than two basis functions, we 
identify a Hilbert space H as having the following properties: 


e Elements (members) f, g, or h of H are subject to two operations, addition, and 
multiplication by a scalar (here k, k,, or kz). These operations produce quantities that 
are also members of the space. 


* Addition is commutative and associative: 
fid+a(=s(s)+ f(s), [f)+8(s)] +h) = f(s) +[e(s) +h(s)]. 
¢ Multiplication by a scalar is commutative, associative, and distributive: 
k f(s)=f(s)k, kL f(s)+ g(s)I=kf(s) +kg(s), 
(ki tka) f(s)=ki f(s) +hko f(s),  kilko f(s)] = kiko f(s). 


e is spanned by a set of basis functions y;, where for the purposes of this book the 
number of such basis functions (the range of 7) can either be finite or denumerably infi- 
nite (like the positive integers). This means that every function in #1 can be represented 
by the linear form f(s) = >, dn@n(s). This property is also known as completeness. 
We require that the basis functions be linearly independent, so that each function in the 
space will be a unique linear combination of the basis functions. 
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e For all functions f(s) and g(s) in H, there exists a scalar product, denoted as (f|g), 
which evaluates to a finite real or complex numerical value (i.e., does not contain s) 
and which has the properties that 


1. (f|f) = 0, with the equality holding only if f is identically zero.* The quantity 
(f | f)'/ is called the norm of f and is written || ||. 


2. (glf)* = (fle), (fle +4) = (fla) + (fle), and (flkg) = k( fg). 


Consequences of these properties are that (f|k,g + koh) = ki (f\g) + ko(f\h), but 
(kflg) =k*(flg) and (ki f + kog|h) = ky (fh) + k3 (gh). 


Example 5.1.2 Some SCALAR PRODUCTS 


Continuing with the first vector space of Example 5.1.1, let’s assume that our scalar product 
of any two functions f(s) and g(s) takes the form 


1 
(flg) = / f*(s) g(s)ds, (5.6) 
=1 


i.e., the formula given as Eq. (5.4) with a = —1, b= 1, and w(s) = 1. Since all the mem- 
bers of this vector space are quadratic forms and the integral in Eq. (5.6) is over the finite 
range from —1 to +1, the scalar product will always exist and our three basis functions 
indeed define a Hilbert space. Before we make a few sample computations, let’s note that 
the brackets in the left member of Eq. (5.6) do not show the detailed form of the scalar 
product, thereby concealing information about the integration limits, the number of vari- 
ables (here we have only one, s), the nature of the space involved, the presence or absence 
of a weight factor w(s), and even the exact operation that forms the product. All these 
features must be inferred from the context or by a previously provided definition. 
Now let’s evaluate two scalar products: 


3 : 2 
(Pols?) = [ Pé(?as = [yas = B =e. 
ala 3 
ae | =4 
3 1 38 17! 
= Se a es ee |b ee es _ 
(voir = fen] 5s ;| as=| 3 5 35| =0 (5.7) 


—l 


Looking further at the scalar product definition of the present example, we note that it 
is consistent with the general requirements for a scalar product, as (1) (f |) is formed as 
the integral of an inherently nonnegative integrand, and will be positive for all nonzero 





2To be rigorous, the phrase “identically zero” needs to be replaced by “zero except on a set of measure zero,” and other conditions 
need to be more tightly specified. These are niceties that are important for a precise formulation of the mathematics but are not 
often of practical importance to the working physicist. We note, however, that discontinuous functions do arise in applications 
of Fourier series, with consequences that are discussed in Chapter 19. 
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f; and (2) the placement of the complex-conjugate asterisk makes it obvious that 


(gl f)* = (fle). a 


Schwarz Inequality 


Any scalar product that meets the Hilbert space conditions will satisfy the Schwarz 
inequality, which can be stated as 


(fla)l? < (FIA) gle). (5.8) 


Here there is equality only if f and g are proportional. In ordinary vector space, the equiv- 
alent result is, referring to Eq. (1.113), 


(A.B)? =|A/?|BI* cos” < |A/*|BI’, (5.9) 


where 6 is the angle between the directions of A and B. As observed previously, the equal- 
ity only holds if A and B are collinear. If we also require A to be of unit length, we have the 
intuitively obvious result that the projection of B onto a noncollinear A direction will have 
a magnitude less than that of B. The Schwarz inequality extends this property to functions; 
their norms shrink on nontrivial projection. 

The Schwarz inequality can be proved by considering 


T=(f —Aglf —Ag) 20, (5.10) 


where A is an as yet undetermined constant. Treating A and A* as linearly independent,’ we 
differentiate J with respect to 4* (remember that the left member of the product is complex 
conjugated) and set the result to zero, to find the 4 value for which J is a minimum: 


(gif) 
(gif =e) 20 Se ae 
(slg) 
Substituting this 1 value into Eq. (5.10), we get (using properties of the scalar product) 
(fled(slf) 
Hine oo 
(slg) 


Noting that (g|g) must be positive, and rewriting (g| f) as (f|g)*, we confirm the Schwarz 
inequality, Eq. (5.8). 


Orthogonal Expansions 


With now a well-behaved scalar product in hand, we can make the definition that two func- 
tions f and g are orthogonal if (f|g) = 0, which means that (g|f) will also vanish. An 
example of two functions that are orthogonal under the then-applicable definition of the 
scalar product are Po(s) and P2(s), where the scalar product is that defined in Eq. (5.6) 
and Po, Po are the functions from Example 5.1.1; the orthogonality is shown by Eq. (5.7). 
We further define a function f as normalized if the scalar product (f|/) = 1; this is the 


3 It is not obvious that one can do this, but consider A = w+iv, A* = wu — iv, with pw and v real. Then , [0/0 +i0/dv] is 
equivalent to taking 0/0* keeping 1 constant. 
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function-space equivalent of a unit vector. We will find that great convenience results if 
the basis functions for our function space are normalized and mutually orthogonal, cor- 
responding to the description of a 2-D or three-dimensional (3-D) physical vector space 
based on orthogonal unit vectors. A set of functions that is both normalized and mutually 
orthogonal is called an orthonormal set. If a member f of an orthogonal set is not nor- 
malized, it can be made so without disturbing the orthogonality: we simply rescale it to 
f =f/(f\f)'”, so any orthogonal set can easily be made orthonormal if desired. 

If our basis is orthonormal, the coefficients for the expansion of an arbitrary function in 
that basis take a simple form. We return to our 2-D example, with the assumption that the 
y; are orthonormal, and consider the result of taking the scalar product of f(s), as given 
by Eq. (5.1), with g; (s): 


(pil f) = (pi l(arg1 + a292)) = a1 (91 \G1) + a2(G11¢2). (5.11) 


The orthonormality of the g now comes into play; the scalar product multiplying aj is 
unity, while that multiplying a2 is zero, so we have the simple and useful result (1 | f) = 
a,. Thus, we have a rather mechanical means of identifying the components of f. The 
general result corresponding to Eq. (5.11) follows: 


If (gilgj)=5i; and f=) agi, then a; =(gilf). (5.12) 


i=1 


Here the Kronecker delta, 5;;, is unity if i = j and zero otherwise. Looking once again 
at Eq. (5.11), we consider what happens if the g; are orthogonal but not normalized. Then 
instead of Eq. (5.12) we would have: 


n 
If the gy; are orthogonal and f = Y-aigi, then aj = (Pil P) : (5.13) 


= (gil¢i) 





This form of the expansion will be convenient when normalization of the basis introduces 
unpleasant factors. 


Example 5.1.3 EXPANSION IN ORTHONORMAL FUNCTIONS 


Consider the set of functions x, (x) = sinnx, for n = 1,2,..., to be used for x in the 
interval 0 < x < zz with scalar product 


ifle)= i fradela)dx. (5.14) 
0 


We wish to use these functions for the expansion of the function x? (a — x). 
First, we check that they are orthogonal: 


I Tv 


Sam = / XR) Xn (XW) dx = / sinnx sinmx dx. 
0 0 
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For n # m this integral can be shown to vanish, either by symmetry considerations or by 
consulting a table of integrals. To determine normalization, we need S,,; from symmetry 
considerations, the integrand, sin? nx = 5(1 — cos 2nx), can be seen to have average value 
1/2 over the range (0, ), leading to Spy, = 2/2 for all integer n. This means the x,, are not 
normalized, but can be made so if we multiply by ./2/7r. So our orthonormal basis will be 


1/2 
ons) = (2) sinnx, n=1,2,3,.... (5.15) 
Xu 


To expand x*(z — x), we apply Eq. (5.2), which requires the evaluation of 
4\1/2 - 
An = (Pn|x? (a — x)) = (=) [oinnsy x(a —x) dx, (5.16) 
0 


for use in the expansion 


9\ 1/2 © 
x7(" —x)= (=) dt sinnx. (5.17) 
n= 


Evaluating cases of Eq. (5.16) by hand or using a computer for symbolic computation, we 
have for the first few a,: a; = 5.0132, ag = —1.8300, a3 = 0.1857, ag = —0.2350. The 
convergence is not very fast. a 


Example 5.1.4 — SPIN SPACE 


A system of four spin- 5 particles in a triplet state has the following three linearly indepen- 
dent spin functions: 


X, =aBaa— Boda, x,=acaB—aaBa, xX3;=aaaB +aaBa —aBaa — Baad. 


The four symbols in each term of these expressions refer to the spin assignments of the 
four particles, in numerical order. 
The scalar product in the spin space has the form, for monomials, 


(abcd|wxyz) = bawd5xScySdz, 


meaning that the scalar product is unity if the two monomials are identical, and is zero if 
they are not. Scalar products involving polynomials can be evaluated by expanding them 
into sums of monomial products. It is easy to confirm that this definition meets the require- 
ments for a valid scalar product. 

Our mission will be (1) verify that the x; are orthogonal; (2) convert them, if neces- 
sary, to normalized form to make an orthonormal basis for the spin space; and (3) expand 
the following triplet spin function as a linear combination of the orthonormal spin basis 
functions: 


Xo = capa — apaa. 


The functions x, and x, are orthogonal, as they have no terms in common. Although x, 
and x3 have two terms in common, they occur in sign combinations leading to a vanishing 
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scalar product. The same observation oe to (X5|X). However, none of the x; are nor- 
malized. We find (x1|X,) = (X21X2) = 2, (x31X3) =4, so an orthonormal basis would be 


—1/2 1/2 
By, / 


gi =2 g2=2° °° xX), B=F 


Finally, we obtain the coefficients for the expansion of x9 by forming a; = (¢1|X9) = 
—1/V2, a2 = (92|X9) = —1//2, and a3 = (93|X9) = 1. Thus, the desired expansion is 


1 1 
a ee ea 


Expansions and Scalar Products 


If we have found the expansions of two functions, 
f= Yo auey and g= buoy, 
bw v 
then their scalar product can be written 


(flg) = Lah botealer} 


If the g set is orthonormal, the above reduces to 


(flg) = dai Bis (5.18) 


In the special case g = f, this reduces to 


(ff) = da lenl (5.19) 


consistent with the requirement that (f|f) > 0, with equality only if f is zero “almost 
everywhere.” 

If we regard the set of expansion coefficients a, as the elements of a column vector 
a representing f, with column vector b similarly representing g, Eqs. (5.18) and (5.19) 
correspond to the matrix equations 


(flg)=a'b, (fl f)=ata. (5.20) 


Note that by taking the adjoint of a, we both complex conjugate it and convert it into a row 
vector, so that the matrix products in Eq. (5.20) collapse to scalars, as required. 





5.1 Vectors in Function Spaces 
Example 5.1.5 COEFFICIENT VECTORS 


A set of functions that is orthonormal on 0 < x < z is 


2 — dno 
Qn (x) = ,/ ——— cosnx, n=0,1,2,.... 
T 


First, let us expand in terms of this basis the two functions 


Wr = cos’ x + sin? x + cosx + 1 and 2 = cos” x — cosx. 


We write the expansions as vectors a; and a2 with components n = 0,..., 3: 
(golv1) (golw2) 
— (gilv1) pe (gilW2) 
(galwi) |? (g2lw2) 
(g3lW1) (g3|W2) 
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All components beyond n = 3 vanish and need not be shown. It is straightforward to 
evaluate these scalar products. Alternatively, we can rewrite the yj; using trigonometric 


identities, reaching the forms 


r cos3x  cos2x 4 7 4 3 é cos 2x 
= cosx + —, — 
| ~ A SFP Gg 








— cos =. 
ATS 


These expressions are now easily recognized as equivalent to 








wat ~3 g , Ta , 3v2 90 walt g2 v2 00 
“Vo\a a7 4 g p Pyole Tog 


SO 
3/2/2 J/2/2 
aa /z| 74| ,- fz} -! 
P=4o | <a “Vol ie 
1/4 0 


We see from the above that the general formula for finding the coefficients in an orthonor- 
mal expansion, Eq. (5.12), is a systematic way of doing what sometimes can be carried out 


in other ways. 


We can now evaluate the scalar products (y;|w;). Identifying these first as matrix prod- 


ucts that we then evaluate, 


63 7 
(lv) =ala) = as (Wil) =alay = =: (WalWo) = ala = 
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Bessel’s Inequality 


Given a set of basis functions and the definition of a space, it is not necessarily assured that 
the basis functions span the space (a property sometimes referred to as completeness). For 
example, we might have a space defined to be that containing all functions possessing a 
scalar product of a given definition, while the basis functions have been specified by giving 
their functional form. This issue is of some importance, because we need to know whether 
an attempt to expand a function in a given basis can be guaranteed to converge to the correct 
result. Totally general criteria are not available, but useful results have been obtained if 
the function being expanded has, at worst, a finite number of finite discontinuities, and 
results are accepted as “accurate” if deviations from the correct value occur only at isolated 
points. Power series and trigonometric series have been proved complete for the expansion 
of square integrable functions f (those for which (f|f) as defined in Eq. (5.7) exists; 
mathematicians identify such spaces by the designation £7). Also proved complete are the 
orthonormal sets of functions that arise as the solutions to Hermitian eigenvalue problems.* 

A not too practical test for completeness is provided by Bessel’s inequality, which states 
that if a function f has been expanded in an orthonormal basis as ©, dn @n, then 


ie (5.21) 


with the inequality occurring if the expansion of f is incomplete. The impracticality of 
this as a completeness test is that one needs to apply it for all f before using it to claim 
completeness of the space. 

We establish Bessel’s inequality by considering 


1=(7-Dae 





f- doa oj)20 (5.22) 
j 


where J = 0 represents what is termed convergence in the mean, a criterion that per- 
mits the integrand to deviate from zero at isolated points. Expanding the scalar product, 
and eliminating terms that vanish because the g are orthonormal, we arrive at Eq. (5.21), 
with equality only resulting if the expansion converges to f. We note in passing that con- 
vergence in the mean is a less stringent requirement than uniform convergence, but is 
adequate for almost all physical applications of basis-set expansions. 


Example 5.1.6 EXPANSION OF A DISCONTINUOUS FUNCTION 





The functions cosnx (n = 0,1,2,...) and sinnx (n = 1,2,...) have (together) been 
shown to form a complete set on the interval —z < x < z. Since this determination is 
obtained subject to convergence in the mean, there is the possibility of deviation at iso- 
lated points, thereby permitting the description of functions with isolated discontinuities. 


4See R. Courant and D. Hilbert, Methods of Mathematical Physics (English translation), Vol. 1, New York: Interscience (1953), 
reprinting, Wiley (1989), chapter 6, section 3. 
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We illustrate with the square-wave function 


h 
2 O<x<az 
f@= h (5.23) 
—-~, —-m <x <0. 
2 
The functions cosnx and sinnx are orthogonal on the expansion interval (with unit weight 
in the scalar product), and the expansion of f(x) takes the form 
CO 
f(®) =ag+ YG cosnx + by, sinnx). 
n=1 


Because f (x) is an odd function of x, all the a, vanish, and we only need to compute 
1 is 
b, = — / f(t) sinnt dt. 
4 
—1 


The factor 1/a preceding the integral arises because the expansion functions are not 
normalized. 
Upon substitution of +//2 for f(t), we find 








h 0, n even, 
b, = —(1—cosnz) = 4} 2h 
"nn —, nodd. 
ni 
Thus, the expansion of the square wave is 
2h Q sin(2n + 1)x 
= ‘ 5.24 
FO) a du 2n+1 ( ) 


To give an idea of the rate at which the series in Eq. (5.24) converges, some of its partial 
sums are plotted in Fig. 5.1. 
| 


Expansions of Dirac Delta Function 


Orthogonal expansions provide opportunities to develop additional representations of the 
Dirac delta function. In fact, such a representation can be built from any complete set of 
functions ¢, (x). For simplicity we assume the g, to be orthonormal with unit weight on 
the interval (a, b), and consider the expansion 

Cc 

5(x 1) =) ent) gna), (5.25) 

n=0 
where, as indicated, the coefficients must be functions of t. From the rule for determining 
the coefficients, we have, for t also in the interval (a, b), 

b 


aie / ot (x) 8(x —t)dx = o%(0), (5.26) 


a 
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FiGURE 5.1 Expansion of square wave. Computed using Eq. (5.24) with summation 
terminated after n = 4, 8, 12, and 20. Curves are at different vertical scales to 
enhance visibility. 


where the evaluation has used the defining property of the delta function. Substituting this 
result back into Eq. (5.25), we have 


5x1) = Don) yn lx). (5.27) 


n=0 


This result is clearly not uniformly convergent at x = t. However, remember that it is not 
to be used by itself, but has meaning only when it appears as part of an integrand. Note 
also that Eq. (5.27) is only valid when x and ¢ are within the range (a, b). 

Equation (5.27) is called the closure relation for the Dirac delta function (with respect 
to the y,) and obviously depends on the completeness of the g set. If we apply Eq. (5.27) 
to an arbitrary function F(t) that we assume to have the expansion F(t) = )~ p&p &p (t), 
we have 


b b oe) oe) 
[Fes -nat= far Yep op Do ons) 
a a p=0 n=0 
=> cpap) = FG), (5.28) 
p=0 


which is the expected result. However, if we replace the integration limits (a, b) by (f1, f2) 
such that a < ft) < f2 < b, we get a more general result that reflects the fact that our 
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FiGuRE 5.2. Approximation at VN = 80 to 6(t — x), Eq. (5.30), for t = 0.4. 


representation of 5(x — t) is negligible except when x ~ t: 


t 
- F(x), t) <x <h, 


[Fos —t)dt= (5.29) 


- X<f,) Orx >fo. 
t 


Example 5.1.7 DELTA FUNCTION REPRESENTATION 


To illustrate an expansion of the Dirac delta function in an orthonormal basis, take g, (x) = 
/2 sinnz x, which are orthonormal and complete on x = (0, 1) forn = 1,2,.... Then the 
Dirac delta function has representation, valid forO <x <1,0<t<1, 


N 
8(x —t) = lim >) 2sinnae sinnx. (5.30) 
n=1 


Plotting this with NV = 80 for t = 0.4 and 0 < x < | gives the result shown in Fig. 5.2. Mf 


Dirac Notation 


Much of what we have discussed can be brought to a form that promotes clarity and 
suggests possibilities for additional analysis by using a notational device invented by 
P. A. M. Dirac. Dirac suggested that instead of just writing a function f, it be written 
enclosed in the right half of an angle-bracket pair, which he named a ket. Thus f —> |f), 
yi — |g;), etc. Then he suggested that the complex conjugates of functions be enclosed 
in left half-brackets, which he named bras. An example of a bra is y* — (;|. Finally, he 
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suggested that when the sequence (bra followed by ket = bra+ket ~ bracket) is encoun- 
tered, the pair should be interpreted as a scalar product (with the dropping of one of the two 
adjacent vertical lines). As an initial example of the use of this notation, take Eq. (5.12), 
which we now write as 


If) = do ajle;) => lei (eit) =| doles) @il | LP). (5.31) 
J J J 


This notational rearrangement shows that we can view the expansion in the ¢ basis as the 
insertion of a set of basis members in a way which, in sum, has no effect. If the sum is over 
a complete set of y;, the ket-bra sum in Eq. (5.31) will have no net effect when inserted 
before any ket in the space, and therefore we can view the sum as a resolution of the 
identity. To emphasize this, we write 


1= 0 lgi)g;l- (5.32) 
J 


Many expressions involving expansions in orthonormal sets can be derived by the insertion 
of resolutions of the identity. 

Dirac notation can also be applied to expressions involving vectors and matrices, where 
it illuminates the parallelism between physical vector spaces and the function spaces here 
under study. If a and b are column vectors and M is a matrix, then we can write |b) as a 
synonym for b, we can write (a| to mean a’, and then (a|b) is interpreted as equivalent to 
a'b, which (when the vectors are real) is matrix notation for the (scalar) dot product a - b. 
Other examples are expressions such as 


a=Mb<|a)=|Mb) =M|b) or a’Mb=(M'a)'b< (a|Mb) = (Malb). 


Exercises 


5.1.1 


5.1.2 


A function f(x) is expanded in a series of orthonormal functions 


CO 


fx) = Do angn(x). 


n=0 


Show that the series expansion is unique for a given set of gy, (x). The functions @, (x) 
are being taken here as the basis vectors in an infinite-dimensional Hilbert space. 


A function f(x) is represented by a finite set of basis functions g; (x), 


N 
fx)=)> cigi(x). 


i=1 
Show that the components c; are unique, that no different set c! exists. 


Note. Your basis functions are automatically linearly independent. They are not neces- 
sarily orthogonal. 





5.1.4 
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A function f(x) is approximated by a power series yo c;x! over the interval [0, 1]. 
Show that minimizing the mean square error leads to a set of linear equations 


Ac=b, 


where 
1 
Aya fx dx= i,j =0,1,2,...,n-—1 
, i+j+l 
0 


and 
1 
bj = [ x! rooax, P0459). t= 1, 
0 
Note. The A;; are the elements of the Hilbert matrix of order n. The determinant of this 
Hilbert matrix is a rapidly decreasing function of n. For n = 5, detA = 3.7 x 10~!” and 
the set of equations Ac = b is becoming ill-conditioned and unstable. 
In place of the expansion of a function F(x) given by 


CO 


F(x) =) > angn(a), 


n=0 
with 
b 
i : Penne: 


take the finite series approximation 


F(x) © } cn Gn(x)- 


n=0 
Show that the mean square error 
b m 


2 
/ FS oven w(x) dx 


a n=0 
is minimized by taking cy = dy. 
Note. The values of the coefficients are independent of the number of terms in the finite 
series. This independence is a consequence of orthogonality and would not hold for a 
least-squares fit using powers of x. 


From Example 5.1.6, 


h 
=, O<x<az 
2 


f(x) = 





_ 2h 3 sin(2n + 1)x 
== 


2n+1 
— 5 —1m <x <0 n=0 os 
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(a) Show that 


ee 0° 


[roof arte Sonn 


a n=0 


For a finite upper limit this would be Bessel’s inequality. For the upper limit oo, 
this is Parseval’s identity. 


(b) Verify that 


Ah 
2 p= Qnty? 
a 
n=0 
by evaluating the series. 


Hint. The series can be expressed in terms of the Riemann zeta function ¢(2) = 17/6. 
5.1.6 Derive the Schwarz inequality from the identity 


2» b 


b 
[ feoseoas = [ [roof ax fifeco] as 


a 


1 b b 2 
-5 | ax f ax[reoeoy- rere) 





$-Dayes)20 


derive Bessel’s inequality, (f|f) > dilanl, 


5.1.7 Starting from J = ( - Yai Qi 


i 


5.1.8 Expand the function sinzx ina series of functions g; that are orthogonal (but not nor- 
malized) on the range 0 < x < 1 when the scalar product has definition 


1 
(fle) = i; fOdg(n dx. 
0 


Keep the first four terms of the expansion. The first four gj; are: 
go=l, g=2x—-1, gw =6x*—6x4+1, 3 = 20x? — 30x74 12x — 1. 
Note. The integrals that are needed are the subject of Example 1.10.5. 


5.1.9 Expand the function e~* in Laguerre polynomials L,, (x), which are orthonormal on the 
range 0 < x < oo with scalar product 


(fle) = / FY ade(ne* de. 
0 
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Keep the first four terms of the expansion. The first four L,, (x) are 


2 2. 43 
Lo=1, Ly=1-x, fee pao — : 
2 6 
5.1.10 | The explicit form of a function f is not known, but the coefficients a, of its expan- 
sion in the orthonormal set g, are available. Assuming that the g, and the members of 
another orthonormal set, x,,, are available, use Dirac notation to obtain a formula for 


the coefficients for the expansion of f in the x,, set. 





5.1.11 Using conventional vector notation, evaluate ~~ |é;)(€;|a), where a is an arbitrary vec- 


j 
tor in the space spanned by the é;. 


5.1.12 Letting a= a )é) + a2é2 and b = b1é + br be vectors in R?, for what values of k, if 
any, is 
(alb) = a,b, — ay bz — agby + kanbz 


a valid definition of a scalar product? 


5.2 GRAM-SCHMIDT ORTHOGONALIZATION 


Crucial to carrying out the expansions and transformations under discussion is the avail- 
ability of useful orthonormal sets of functions. We therefore proceed to the description of 
a process whereby a set of functions that is neither orthogonal or normalized can be used 
to construct an orthonormal set that spans the same function space. There are many ways 
to accomplish this task. We present here the method called the Gram-Schmidt orthogo- 
nalization process. 

The Gram-Schmidt process assumes the availability of a set of functions x,, and an 
appropriately defined scalar product (f|g). We orthonormalize sequentially to form the 
orthonormal functions y,, meaning we make the first orthonormal function, go, from Xo, 
the next, g), from Xo and x,, etc. If, for example, the X, are powers x, the orthonormal 
function ¢ will be a polynomial of degree v in x. Because the Gram-Schmidt process is 
often applied to powers, we have chosen to number both the x and the ¢ sets starting from 
zero (rather than 1). 

Thus, our first orthonormal function will simply be a normalized version of Xp. 
Specifically, 

ee _ (5.33) 
(Xo|Xo) i 
To check that Eq. (5.33) is correct, we form 


Xo 
(Xo|Xo) 





1/2 





Xo ={ 
(XolXo) i? 
Next, starting from go and x,, we form a function that is orthogonal to gp. We use ¢ rather 
than x, to be consistent with what we will do in later steps of the process. Thus, we write 


Wi = X1 — 41,090- (5.34) 


(golGo) =| 
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What we are doing here is the removal from x, of its projection onto go, leaving a remain- 
der that will be orthogonal to gp. Remembering that go is normalized (of “unit length’’), 
that projection is identified as (¢o|x,)o, so that 


41,0 = (G0|X1)- (5.35) 


In case Eq. (5.35) is not intuitively obvious, we can confirm it by writing the requirement 
that yr, be orthogonal to go: 


(poli) = (go| (x1 — a1,090)) = (volx1) — a1,0(goly0) = 0. 


which, because go is normalized, reduces to Eq. (5.35). The function yw is not in general 
normalized. To normalize it and thereby obtain ¢1, we form 


ae (5.36) 


(Wild)? 
To continue further, we need to make, from go, 1, and x,, a function that is orthogonal 
to both go and ¢ 1. It will have the form 


Yr = Xz — 40,290 — 41,291. (5.37) 
The last two terms of Eq. (5.37), respectively, remove from x, its projections on go and 9; 
these projections are independent because ¢ and g are orthogonal. Thus, either from our 


knowledge of projections or by setting to zero the scalar products (g;|w2) (i = 0 and 1), 
we establish 

ao,2 = (Yo|X2), 41,2 = (V1|X2)- (5.38) 
Finally, we make g2 = W2/(W2|W2)'/”. 

The generalization for which the above is the first few terms is that, given the prior 
formation of g;, i =0,..., — 1, the orthonormal function ¢, is obtained from x, by the 
following two steps: 

n—| 
Wn = Xn — Yo GulXnd Pus 


u=0 


Vn 
Pn nln) vee 
Reviewing the above process, we note that different results would have been obtained if 
we used the same set of x;, but simply took them in a different order. For example, if we 
had started with x3, one of our orthonormal functions would have been a multiple of x3, 


while the set we constructed yielded 3 as a linear combination of x,,, 4 =0, 1, 2, 3. 


Example 5.2.1 LEGENDRE POLYNOMIALS 


Let us form an orthonormal set, taking the x,, as x", and making the definition 


i 
(fle) = i; fre)ea)de. (5.40) 
—1 
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This scalar product definition will cause the members of our set to be orthogonal, with 
unit weight, on the range (—1, 1). Moreover, since the x,, are real, the complex conjugate 
asterisk has no operational significance here. 

The first orthonormal function, go, is 


1 1 


1 
(111/72 — ; 172 Fy 
=4 


To obtain 1, we first obtain 1 by evaluating 





go(x) = 


Wi(x) =x — (golx)go(x) =x, 


where the scalar product vanishes because go is an even function of x, whereas x is odd, 
and the range of integration is even. We then find 


3 
a1(x) = — gayie 
[La] 

—1 


1 1 
Wrox) = x° — (golx”) pole) — (pile?) pi (a) = x” ( wail =)=* 





The next step is less trivial. We form 


1 
3 ’ 





where we have used symmetry to set (v1 |x”) to zero and evaluated the scalar product 


ees [ow 


goad 5/3 1 
= 3 sx, o} 
g2(x) = ; (54 5) 


-l 


Then, 





Continuation to one more orthonormal function yields 


71(5 3 
a= fi (be - 5+). 


Reference to Chapter 15 will show that 


2n+1 
2 
where P,,(x) is the nth degree Legendre polynomial. Our Gram-Schmidt process provides 


a possible but very cumbersome method of generating the Legendre polynomials; other, 
more efficient approaches exist. | 





On(x) = P,(x), (5.41) 
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Table 5.1 Orthogonal Polynomials Generated by Gram-Schmidt Orthogonalization of 
Un(x) =x",n=0, 1, 2,.... 








Polynomials Scalar Products Table 
1 
Legendre / Pr (x) Pim (x)dx = 26mn /(2n + 1) Table 15.1 
-1 
1 
Shifted Legendre / P* (x) Py (x)dx = dmn/(2n + 1) Table 15.2 
0 
1 
Chebyshev I / Tn (x) Tm (2) (1 — x2) 71/2 dx = 8mnm /(2 — 8n0) Table 18.4 
2] 
1 
Shifted Chebyshev I / T,* (x) Tx (x)[x — x) dx = 8mnt/(2 — 80) Table 18.5 
0 
1 
Chebyshev II i Un (2)Um (x)(1 — x2)? dx = imum /2 Table 18.4 
-1 
CO 
Laguerre / Ly(x)Lm(x)e* dx = Sinn Table 18.2 
0 
CO 
Associated Laguerre i Lk (x) Lk (x)e~* dx = Smn(n +k)!/n! Table 18.3 
0 
CO 
Hermite / Hy (x) Hm (x)e7® dx =2"8nnr/2n! Table 18.1 
=O) 





The intervals, weights, and conventional normalization can be deduced from the forms of the scalar products. 
Tables of explicit formulas for the first few polynomials of each type are included in the indicated tables 
appearing in Chapters 15 and 18 of this book. 


The Legendre polynomials are, except for sign and scale, uniquely defined by the Gram- 
Schmidt process, the use of successive powers of x, and the definition adopted for the 
scalar product. By changing the scalar product definition (different weight or range), we 
can generate other useful sets of orthogonal polynomials. A number of these are presented 
in Table 5.1. For various reasons most of these polynomial sets are not normalized to 
unity. The scalar product formulas in the table give the conventional normalizations, and 
are those of the explicit formulas referenced in the table. 


Orthonormalizing Physical Vectors 


The Gram-Schmidt process also works for ordinary vectors that are simply given by their 
components, it being understood that the scalar product is just the ordinary dot product. 
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Example 5.2.2 — ORTHONORMALIZING A 2-D MANIFOLD 


A 2-D manifold (subspace) in 3-D space is defined by the two vectors a; = @; + €2 — 26€3 
and az = e€ + 2€2 — 363. In Dirac notation, these vectors (written as column matrices) are 


1 1 
la)=| 1], |az)=] 2 
= = 


Our task is to span this manifold with an orthonormal basis. 
We proceed exactly as for functions: Our first orthonormal basis vector, which we call 
b,, will be a normalized version of a;, and therefore formed as 


b))= al _ ol _ 
| 1)= aia? 2 ee 


An unnormalized version of a second orthonormal function will have the form 


7 —1/2 
|b5) = |a2) — (b;|a2)|by) = Jaz) — 612 Ibi) =| 1/2 
0 


Normalizing, we reach 


by 1 / 

|b2) = hiiht2 

(b,|b>) V2 
a 

Exercises 
For the Gram-Schmidt constructions in Exercises 5.2.1 through 5.2.6, use a scalar prod- 
uct of the form given in Eq. (5.7) with the specified interval and weight. 

5.2.1 Following the Gram-Schmidt procedure, construct a set of polynomials P* (x) orthog- 
onal (unit weighting factor) over the range [0, 1] from the set [1, x, x”, ...]. Scale so 


that P*(1) = 1. 


ANS. P*(x) =1, 
Pigs 2e=1, 
P(x) = 6x? — 6x +1, 
P3(x) = 20x? — 30x? + 12x - 1. 


These are the first four shifted Legendre polynomials. 


Note. The “*” is the standard notation for “shifted”: [0, 1] instead of [—1, 1]. It does not 
mean complex conjugate. 


5.2.2 Apply the Gram-Schmidt procedure to form the first three Laguerre polynomials: 


Un(x) =x", n=0,1,2,..., 0<x<~, w(x)=e”. 
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The conventional normalization is 


[o,@) 
[ encotn rear =Snin: 
0 
2—4x + x? 
ANS. Lo=1, Li=(1-x), b=. 


5.2.3 You are given 
(a) aset of functions u,(x) =x",n=0,1,2,..., 
(b) an interval (0, 00), 


(c) a weighting function w(x) = xe~*. Use the Gram-Schmidt procedure to construct 
the first three orthonormal functions from the set u,, (x) for this interval and this 
weighting function. 


ANS. go(x)=1, g(x) =(% —2)/V2, g2(x) = (x? — 6x + 6)/273. 


5.2.4 Using the Gram-Schmidt orthogonalization procedure, construct the lowest three 
Hermite polynomials: 
Un(x) =x", n=0,1,2,..., -w<x <0, w(x) =e. 


For this set of polynomials the usual normalization is 


lee) 
i Hy (x) Hn (x) w(x) dx = 8nn2™ m0 !/?, 


—Co 


ANS. Ho=1, Hi =2x, Ho =4x*—-2. 
5.2.5 Use the Gram-Schmidt orthogonalization scheme to construct the first three Chebyshev 
polynomials (type I): 
Un(x)=x", n=0,1,2,.... -lsx<1, w@)=(-2x7). 
Take the normalization 


! a, m=n=0, 


[ tmcotacqweands = om J x 
=~, m=n>l. 
=] 2 
Hint. The needed integrals are given in Exercise 13.3.2. 
ANS. To=1, Ty =x, Ty =2x*-1, (73 =4x3 —3x). 
5.2.6 Use the Gram-Schmidt orthogonalization scheme to construct the first three Chebyshev 
polynomials (type II): 
Un(xy=x", n=0,1,2,..., -l<x<51, wo)=(—2x7)*”, 
Take the normalization to be 
1 


if Um (x)Un(2)w() dx = Sn 5 
—1 
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Hint. 


13-5 On—1) 


, n=1,2,3,... 
44668505 (ln 49) 





1 
[a — x2) "22" dy = a x 
—1 


a 
=—, n=0. 
2. 


ANS. Ujp=1, U;=2x, U2. =4x?—-1. 


5.2.7 As a modification of Exercise 5.2.5, apply the Gram-Schmidt orthogonalization proce- 
dure to the set u,(x) =x",n=0,1,2,...,0< x < ow. Take w(x) to be exp(—x?). 
Find the first two nonvanishing polynomials. Normalize so that the coefficient of the 
highest power of x is unity. In Exercise 5.2.5, the interval (—0o, 00) led to the Hermite 
polynomials. The functions found here are certainly not the Hermite polynomials. 


ANS. g=l, gp=x—a7!/, 


5.2.8 Form a set of three orthonormal vectors by the Gram-Schmidt process using these input 
vectors in the order given: 


1 1 1 
C= 1 > OQ= 1 > O= 0 
1 2 


5.3. OPERATORS 


An operator is a mapping between functions in its domain (those to which it can be 
applied) and functions in its range (those it can produce). While the domain and the range 
need not be in the same space, our concern here is for operators whose domain and range 
are both in all or part of the same Hilbert space. To make this discussion more concrete, 
here are a few examples of operators: 


e Multiplication by 2: Converts f into 2f; 


e Fora space containing algebraic functions of a variable x, d/dx: Converts f(x) into 
df/dx; 

e An integral operator A defined by A f(x) = f G(x, x’) f (x") dx’: A special case of this 
is a projection operator |g; )(g;|, which converts f into (9g; | f)q@;. 


In addition to the abovementioned restriction on domain and range, we also for our present 
purposes restrict attention to operators that are linear, meaning that if A and B are linear 
operators, f and g functions, and k a constant, then 


(A+B)f=Af+Bf, Af+g)=Af+Ag, Ak=kA. 


For both electromagnetic theory and quantum mechanics, an important class of operators 
are differential operators, those that include differentiation of the functions to which they 
are applied. These operators arise when differential equations are written in operator form; 
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for example, the operator 


d° d 
L(x) = (1— x”) Fea S 





enables us to write Legendre’s differential equation, 


d’y dy 
2 
(1 x a a +ay=0, 





in the form £L(y)y = —Ay. When no confusion thereby results, this can be shortened to 
Ly =-dy. 


Commutation of Operators 


Because differential operators act on the function(s) to their right, they do not necessarily 
commute with other operators containing the same independent variable. This fact makes 
it useful to consider the commutator of operators A and B, 


[A, B] = AB—BA. (5.42) 


We can often reduce AB — BA to a simpler operator expression. When we write an 
operator equation, its meaning is that the operator on the left-hand side of the equation 
produces the same effect on every function in its domain as is produced by the opera- 
tor on the right-hand side. Let’s illustrate this point by evaluating the commutator [x, p], 
where p = —id/dx. The imaginary unit i and the name p appear because this operator 
is that corresponding in quantum mechanics to momentum (in a system of units such that 
h =h/2zm = 1). The operator x stands for multiplication by x. 

To carry out the evaluation, we apply [x, p] to an arbitrary function f(x). Inserting the 
explicit form of p, we have 





d d 
bs PIF) = (ap — px) fla) = ix ZO — (1) (709) 
=-ixf'(x) +i( SO) +4f'@) =i SO, 
indicating that 
[x, p] =i. (5.43) 


As indicated before, this means [x, p] f(x) =i f(x) for all f. 
We can carry out various algebraic manipulations on commutators. In general, if A, B, 
C are operators and k is a constant, 


[A,B] =-[B, A], [A,B+C]=[A,B]+[A,C],  k[A, B]=[kA, B] =[A, kB]. 
(5.44) 
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Example 5.3.1 | OPERATOR MANIPULATION 


Given [x, p], we can simplify the commutator [x, p*]. We write, being careful about the 
operator ordering and using Eq. (5.43), 


[x, p°] = xp — pxp + pxp — px =[x, plp + pix, p] =2i p, (5.45) 


a result also obtainable from 


Vi a ; iy are: 
x( fa) £0) (~<a) sf) =27'@) =2i (-1E) 00. 


However, note that Eq. (5.45) follows solely from the validity of Eq. (5.43), and will apply 
to any quantities x and p that satisfy that commutation relation, whether or not we are 
operating with ordinary functions and their derivatives. Put another way, if x and p are 
operators in some abstract Hilbert space and all we know about them is Eq. (5.43), we may 
still conclude that Eq. (5.45) is also valid. a 





Identity, Inverse, Adjoint 


An operator that is generally available is the identity operator, namely one that leaves 
functions unchanged. Depending on the context, this operator will be denoted either J or 
simply 1. Some, but not all operators will have an inverse, namely an operator that will 
“undo” its effect. Letting A~! denote the inverse of A, if A~! exists, it will have the 
property 

A'A=AA!=1. (5.46) 


Associated with many operators will be another operator, called its adjoint and denoted 
At, which will be such that for all functions f and g in the Hilbert space, 


(f|Ag) = (AT flg). (5.47) 


Thus, we see that A‘ is an operator that, applied to the left member of any scalar product, 
produces the same result as is obtained if A is applied to the right member of the same 
scalar product. Equation (5.47) is, in essence, the defining equation for A’. 

Depending on the specific operator A, and the definitions in use of the Hilbert space 
and the scalar product, A’ may or may not be equal to A. If A= A’, A is referred to as 
self-adjoint, or equivalently, Hermitian. If AT = —A, A is called anti-Hermitian. This 
definition is worth emphasis: 


If H'=H, 4 is Hermitian. (5.48) 


Another situation of frequent occurrence is that the adjoint of an operator is equal to 
its inverse, in which case the operator is called unitary. A unitary operator U is therefore 
defined by the following statement: 


If U'=uU7~', U isunitary. (5.49) 


In the special case that U is both real and unitary, it is called orthogonal. 





278 Chapter 5 Vector Spaces 
The reader will doubtless note that the nomenclature for operators is similar to that 


previously introduced for matrices. This is not accidental; we shall shortly develop corre- 
spondences between operator and matrix expressions. 


Example §.3.2 FINDING THE ADJOINT 


Consider an operator A = x(d/dx) whose domain is the Hilbert space whose members f 
have a finite value of ( f |.) when the scalar product has definition 


(fla) = fi f* (x)g(x) dx. 


This space is often referred to as L* on (—o0o, 00). Starting from (f|A g), we integrate by 
parts as needed to move the operator out of the right half of the scalar product. Because f 
and g must vanish at +oo, the integrated terms vanish, and we get 








(flas)= f pxBar= f (op) Bax= / OP) as 


(a) 


We see from the above that At = —(d/dx)x, from which we can find At = —A — 1. 
This A is clearly neither Hermitian nor unitary (with the specified definition of the scalar 
product). | 


Example 5.3.3. ADJOINT DEPENDS ON SCALAR PRODUCT 


For the Hilbert space and scalar product of Example 5.3.2, an integration by parts easily 
establishes that an operator A = —i(d/dx) is self-adjoint, i.e., AT = A. But now let’s 
consider the same operator A, but for the £L? space with —1 < x < 1 (and with a scalar 
product of the same form, but with integration limits +1). In this space, the integrated 
terms from the integration by parts do not vanish, but we can incorporate them into an 
operator on the left half of the scalar product by adding delta-function terms: 


1 
1 df = 
“tf (-Z) gdx 

-1 

1 


= | ([ise- 1) 16+) 12] Fe) g(x) dx. 
dx 


—-l 





d 
(dpe 


In this truncated space the operator A is not self-adjoint. | 





5.3 Operators 279 
Basis Expansions of Operators 


Because we are dealing only with linear operators, we can write the effect of an operator on 
an arbitrary function if we know the result of its action on all members of a basis spanning 
our Hilbert space. In particular, assume that the action of an operator A on member 9,, of 
an orthonormal basis has the result, also expanded in that basis, 


AGu = >> ayy. (5.50) 
v 


Assuming this form for the result of operation with A is not a major restriction; all it says 
is that the result is in our Hilbert space. Formally, the coefficients a,,, can be obtained by 
taking scalar products: 


Ayu = (Prl|AGu) = (Gv Al@u)- (5.51) 


Following common usage, we have inserted an optional (operationally meaningless) verti- 
cal line between A and ¢,,. This notation has the aesthetic effect of separating the operator 
from the two functions entering the scalar product, and also emphasizes the possibility 
that instead of evaluating the scalar product as written, we can without changing its value 
evaluate it using the adjoint of A, as (A'g, IQu). 

We now apply Eq. (5.50) to a function w whose expansion in the ¢ basis is 


W= Do cuOu. Cu = (ul). (5.52) 
LL 
The result is 


Ay= > Gn Ag, => Gi y tae > (Sonex) Qv- (5.53) 
bu Lu v v \u 
If we think of Ay as a function x in our Hilbert space, with expansion 


x=) bw, (5.54) 
v 


we then see from Eq. (5.53) that the coefficients b, are related to c, and a), in a way 
corresponding to matrix multiplication. To make this more concrete, 

e Define c as a column vector with elements c;, representing the function yy, 

e Define b as a column vector with elements b;, representing the function x, 

e Define A as a matrix with elements a;;, representing the operator A, 

e The operator equation x = Ay then corresponds to the matrix equation b = Ac. 

In other words, the expansion of the result of applying A to any function y can be com- 
puted (by matrix multiplication) from the expansions of A and w. In effect, that means that 


the operator A can be thought of as completely defined by its matrix elements, while y 
and x = Aw are completely characterized by their coefficients. 
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We obtain an interesting expression if we introduce Dirac notation for all the quantities 
entering Eq. (5.53). We then have, moving the ket representing ¢, to the left, 


Av =)" lov) (@lAlgu) (Gul), (5.55) 


vy 


which leads us to identify A as 


A=) - |v) (rl Algu) (Gul. (5.56) 
vp 
which we note is nothing other than A, multiplied on each side by a resolution of the 
identity, of the form given in Eq. (5.32). 
Another interesting observation results if we reintroduce into Eq. (5.56) the coefficient 
dy, bringing us to 


A= lovavu (ul: (5.57) 
vu 


Here we have the general form for an operator A, with a specific behavior that is deter- 
mined entirely by the set of coefficients a,,,. The special case A = 1 has already been seen 
to be of the form of Eq. (5.57) with ay, = dyy.- 


Example 5.3.4 — MatRIx ELEMENTS OF AN OPERATOR 


Consider the expansion of the operator x in a basis consisting of functions gy, (x) = 
2 4 . = 
CnHn(x)e~* /?,n =0,1,..., where the H, are Hermite polynomials, with scalar product 


(fle) = / Pmrode 


From Table 5.1, we can see that the g, are orthogonal and that they will also be normalized 
if Cy = (2"n!,/7)~'/*. The matrix elements of x, which we denote x,,, and are written 
collectively as a matrix denoted x, are given by 


CO 
ee 
n= WoltlPu)=CoCn ff Hox) x Hy(aye™ dx, 
—0o 
The integral leading to x,,, can be evaluated in general by using the properties of the 
Hermite polynomials, but our present purposes are adequately served by a straightfor- 
ward case-by-case computation. From the table of Hermite polynomials in Table 18.1, we 
identify 


Ho=1, Hy=2x, Ho=4x*-2, H3=8x°-12x, ..., 


and we take note of the integration formula 


CO 


2n — 1)! 
n= f te P drs CON 


—0o 
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Making use of the parity (even/odd symmetry) of the H,, and the fact that the matrix x 
is symmetric, we note that many matrix elements are either zero or equal to others. We 
illustrate with the explicit computation of one matrix element, x12: 


CO [o,@) 
x12 =C1Cp / (2x)x(4x? — 2e™” dx = CiCo / (8x4 — 4x)e™" dx 
—CO —CO 


= C1Ca| 812 - 41,| =i 
Evaluating other matrix elements, we find that x, the matrix of x, has the form 


0 2/2 0 0 
J2/2 0 1 0 

x=| 0 1 0 J/6/2 ---]. (5.58) 
0 0 6/2 0 


Basis Expansion of Adjoint 


We now look at the adjoint of our operator A as an expansion in the same basis. Our 
starting point is the definition of the adjoint. For arbitrary functions wy and x, 
(WIAIX) = (ATWIx) = (XIAT YY, 


where we reached the last member of the equation by using the complex conjugation prop- 
erty of the scalar product. This is equivalent to 


(AT) = (hlAlx)* = CG @ nl | 


ve 


=o (Wlo)*at, (@ulx)* 
vu 


=i (xleuat, (ool), (5.59) 
vu 
where in the last line we have again used the scalar product complex conjugation property 
and have reordered the factors in the sum. 
We are now in a position to note that Eq. (5.59) corresponds to 


AT =)°lovat, (Qul- (5.60) 

vu 
In writing Eq. (5.60) we have changed the dummy indices to make the formula as similar 
as possible to Eq. (5.57). It is important to note the differences: The coefficient a,,, of 


Eq. (5.57) has been replaced by a/,,,, So we see that the index order has been reversed and 
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the complex conjugate taken. This is the general recipe for forming the basis set expansion 
of the adjoint of an operator. The relation between the matrix elements of A and of A‘ 
is exactly that which relates a matrix A to its adjoint A’, showing that the similarity in 
nomenclature is purposeful. We thus have the important and general result: 


e If A is the matrix representing an operator A, then the operator A’, the adjoint of A, is 
represented by the matrix A’. 


Example §.3.5 — ADJOINTOF SPIN OPERATOR 


Consider a spin space spanned by functions we call a and 8, with a scalar product com- 
pletely defined by the equations (a|~) = (6|B) = 1, (a|B) = 0. An operator B is such 
that 


Ba=0, BB=a. 
Taking all possible linearly independent scalar products, this means that 
(a|Ba) =0, (B|Ba)=0, (a|BB)=1, (6|BB) =0. 
It is therefore necessary that 
(Biaja)=0, (B'Bla)=0, (B'alp)=1, (B'B|B) =0, 
which means that B’ is an operator such that 
Bia=B, B'p=0. 


The above equations correspond to the matrices 


(0 1 ‘(0 
o=(\ 0) . =( a) 


We see that B* is the adjoint of B, as required. | 


Functions of Operators 


Our ability to represent operators by matrices also implies that the observations made in 
Chapter 3 regarding functions of matrices also apply to linear operators. Thus, we have 
definite meanings for quantities such as exp(A), sin(A), or cos(A), and can also apply to 
operators various identities involving matrix commutators. Important examples include the 
Jacobi identity (Exercise 2.2.7), and the Baker-Hausdorff formula, Eq. (2.85). 


Exercises 


Show (without introducing matrix representations) that the adjoint of the adjoint of an 
operator restores the original operator, i.e., that (At)' = A. 
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5.3.2 U and V are two arbitrary operators. Without introducing matrix representations of 
these operators, show that 


(Uv)t=viut. 


Note the resemblance to adjoint matrices. 





5.3.3 Consider a Hilbert space spanned by the three functions g; = x1, 2 = x2, 93 = x3, and 
a scalar product defined by (x) |x) = dup. 


(a) Form the 3 x 3 matrix of each of the following operators: 


3 
~ 7] ) 7] 

= oe (;-.) ae (=) -n(=). 
i=l 


(b) Form the column vector representing yy = x, — 2x2 + 3x3. 


(c) Form the matrix equation corresponding to x = (A, — Az)w and verify that the 
matrix equation reproduces the result obtained by direct application of A; — A2 


to y. 


5.3.4 (a) Obtain the matrix representation of A = x(d/dx) in a basis of Legendre polyno- 
mials, keeping terms through P3. Use the orthonormal forms of these polynomials 
as given in 5.2.1 and the scalar product defined there. 


(b) Expand x? in the orthonormal Legendre polynomial basis. 


(c) Verify that Ax? is given correctly by its matrix representation. 


5.4 SELF-ADJOINT OPERATORS 


Operators that are self-adjoint (Hermitian) are of particular importance in quantum 
mechanics because observable quantities are associated with Hermitian operators. In 
particular, the average value of an observable A in a quantum mechanical state described 
by any normalized wave function y is given by the expectation value of A, defined as 


(A) = (WIAly). (5.61) 


This, of course, only makes sense if it can be assured that (A) is real, even if y and/or 
A is complex. Using the fact that A is postulated to be Hermitian, we take the complex 
conjugate of (A): 


(A)* = (WlAly)* = (Avy), 


which reduces to (A) because A is self-adjoint. 

We have already seen that if A and A‘ are expanded in a basis, the matrix A‘ must be 
the matrix adjoint of the matrix A. This means that the coefficients in its expansion must 
satisfy 


dv = ais (coefficients of self-adjoint A). (5.62) 
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Thus, we have the nearly self-evident result: A matrix representing a Hermitian operator 
is a Hermitian matrix. It is also obvious from Eq. (5.62) that the diagonal elements of a 
Hermitian matrix (which are expectation values for the basis functions) are real. 

We can easily verify from basis expansions that (A) must be real. Letting ¢ be the vector 
of expansion coefficients of y in the basis for which a,,, are the matrix elements of A, then 


(A) = (WlAly) = (Doel 4 Daum) =Pelolalene 
Lb ve 
= = C*ayycy = ¢' Ac, 
vu 


which reduces, as it must, to a scalar. Because A is a self-adjoint matrix, ¢' Ac is easily seen 
to be a self-adjoint 1 x 1 matrix, i.e., a real scalar (use the facts that (BAC)' = C'A'B? and 
that A‘ = A). 








Example 5.4.1. Some SELF-ADJOINT OPERATORS 


Consider the operators x and p introduced earlier, with a scalar product of definition 
CO 
(fle) =f Poecdr, (5.63) 
[o,@) 


where our Hilbert space is the set of all functions f for which (f|f) exists (i.e., (f|f) is 
finite). This is the £* space on the interval (—00, 00). To test whether x is self-adjoint, we 
compare (f|xg) and (xf|g). Writing these out as integrals, we consider 


[o,@) [o.@) 
i f*(x)x g(x)dx vs. / [xf x)I* g(x)dx. 
—CO =O 
Because the order of ordinary functions (including x) can be changed without affecting the 
value of an integral, and because x is inherently real, these two expressions are equal and 


x is self-adjoint. 
Turning next to p = —i(d/dx), the comparison we must make is 


[re [iB | as vs. [2 g(x)dx. (5.64) 


We can bring these expressions into better correspondence if we integrate the first by parts, 
differentiating f*(x) and integrating dg(x)/dx. Doing so, the first expression above be- 
comes 


/ ime (EO ax = ipa) — f | [ - ig) Jax 


—oo 
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The boundary terms to be evaluated at +00 must vanish because (f|/) and (g|g) are finite, 
which assures also (from the Schwarz inequality) that ( |g) is finite as well. Upon moving 
i within the complex conjugate in the remaining integral, we verify agreement with the 
second expression in Eq. (5.64). Thus both x and p are self-adjoint. Note that if p had not 
contained the factor 7, it would not have been self-adjoint, as we obtained a needed sign 
change when i was moved within the scope of the complex conjugate. a 


Example 5.4.2} ExpectaTiON VALUE OF p 


Because p, though Hermitian, is also imaginary, consider what happens when we compute 
its expectation value for a wave function of the form w(x) = e!? f(x), where f(x) is a 
real £? wave function and @ is a real phase angle. Using the scalar product as defined in 
Eq. (5.63), and remembering that p = —i(d/dx), we have 


feof @) i fd 2 
Xx l 
=i f roace$ f Afro 
=i f ro) Par=-5 f S[ pe] as 
—~0o —oo 
i 2 2 
=-5| f(+00)? — f(-00)*] =0. 
As shown, this integral vanishes because f (x) = 0 at oo (this is fortunate because expec- 
tation values must be real). This result corresponds to the well-known property that wave 
functions that describe time-dependent phenomena (nonzero momentum) cannot either be 


real or real except for a constant (complex) phase factor. | 





The relations between operators and their adjoints provide opportunities for rearrange- 
ments of operator expressions that may facilitate their evaluation. Some examples follow. 


Example 5.4.3 OPERATOR EXPRESSIONS 


(a) Suppose we wish to evaluate ((x? + p?)wlg), with w of a complicated functional 


form that might be unpleasant to differentiate (as required to apply p), whereas 9 is 


simple. Because x is self-adjoint, so also is x7: 


(x? wig) = (xv lxg) = (Wlx79). 
The same is true of p*, so ((x* + p*)Wlg) = (WI(x? + p?)g). 


(b) Look next at ((x + ip)W|(x +ip)w), which is the expression to be evaluated if we 
want the norm of (x + ip)w. Note that x + ip is not self-adjoint, but has adjoint 
x — ip. Our norm rearranges to 


(x + ip)wl(x + ipyy) = (Wwi(x — ip) + ip)|) 
= (Wlx? + p? +i(xp — px)|v) 
= (Wlx? + p? +i |W) = (lx? + p? — 11). 
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(c) 


Exercises 


5.4.1 


5.4.2 


5.4.3 


5.4.4 


5.4.5 


To reach the last line of the above equation, we recognized the commutator [x, p] = 1, 
as established in Eq. (5.43). 

Suppose that A and B are self-adjoint. What can we say about the self-adjointness of 
AB? Consider 


(W|AB\g) = (Av|Blg) = (BAWI|9). 


Note that because we moved A to the left first (with no dagger needed because it is 
self-adjoint), it is part of what the subsequently moved B must operate on. So we see 
that the adjoint of AB is BA. We conclude that AB is only self-adjoint if A and B 
commute (so that BA = AB). Note that if A and B were not individually self-adjoint, 
their commutation would not be sufficient to make AB self-adjoint. 

| 


(a) Ais anon-Hermitian operator. Show that the operators A+ A‘ and i(A — A‘) are 
Hermitian. 


(b) Using the preceding result, show that every non-Hermitian operator may be 
written as a linear combination of two Hermitian operators. 


Prove that the product of two Hermitian operators is Hermitian if and only if the two 
operators commute. 


A and B are noncommuting quantum mechanical operators, and C is given by the 
formula 


AB—BA=iC. 
Show that C is Hermitian. Assume that appropriate boundary conditions are satisfied. 


The operator £ is Hermitian. Show that (£7) > 0, meaning that for all y in the space in 
which CL is defined, (y|L7|W) > 0. 


Consider a Hilbert space whose members are functions defined on the surface of the 
unit sphere, with a scalar product of the form 


(rie) = faa ‘ee 


where dQ is the element of solid angle. Note that the total solid angle of the sphere is 
47. We work here with the three functions g, = Cx/r, g2 = Cy/r, 93 = Cz/r, with C 
assigned a value that makes the gy; normalized. 


(a) Find C, and show that the g; are also mutually orthogonal. 


(b) Form the 3 x 3 matrices of the angular momentum operators 


OE ee ee ee 
BN ae age OE a a 
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(c) Verify that the matrix representations of the components of L satisfy the angular 
momentum commutator [L,y, Ly] =iLz. 


5.5 UNITARY OPERATORS 


One of the reasons unitary operators are important in physics is that they can be used to 
describe transformations between orthonormal bases. This property is the generalization 
to the complex domain of the rotational transformations of ordinary (physical) vectors that 
we analyzed in Chapter 3. 


Unitary Transformations 


Suppose we have a function y that has been expanded in the orthonormal basis g: 


v=) cuoe = (Diewte) Iv). (5.65) 
Mh im 


We now wish to convert this expansion to a different orthonormal basis, with functions 
gi,. A possible starting point is to recognize that each of the original basis functions can be 
expanded in the primed basis. We can obtain the expansion by inserting a resolution of the 
identity in the primed basis: 


Ou => Up, = (x lg) si! lu) = > (Pheu) g. (5.66) 


v Vv 
Comparing the second and fourth members of this equation, we identify u,, as the ele- 
ments of a matrix U: 
Uvp = (Py|Py)- (5.67) 


Note how the use of resolutions of the identity makes these formulas obvious, and that 
Eqs. (5.65) to (5.67) are only valid because the y,, and the ¢g/, are complete orthonormal 
sets. 

Inserting the expansion for y,, from Eq. (5.66) into Eq. (5.65), we reach 


v= Vien dene = do (x sucn) Gy = CP, (5.68) 
UL v 


v wu 


where the coefficients c/, of the expansion in the primed basis form a column vector e¢’ that 
is related to the coefficient vector ¢ in the unprimed basis by the matrix equation 


ce =Uc, (5.69) 
with U the matrix whose elements are given in Eq. (5.67). 


If we now consider the reverse transformation, from an expansion in the primed basis 
to one in the unprimed basis, starting from 


P= > WwuPy = > (OIG) Ov. (5.70) 
v 


v 
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we see that V, the matrix of the transformation inverse to U, has elements 
tin = (le) =U) oe = UW ye (5.71) 
In other words, 
V=U!, (5.72) 


If we now transform the expansion of w, given in the unprimed basis by the coefficient 
vector ¢, first to the primed basis and then back to the original unprimed basis, the coeffi- 
cients will transform, first to e’ and then back to ¢, according to 


e=VUc=U'Ue. (5.73) 


In order for Eq. (5.73) to be consistent it is necessary that UU be a unit matrix, meaning 
that U must be unitary. We thus have the important following result: 


The transformation that converts the expansion of a vector ¢ in any orthonormal basis 
{~u} to its expansion e! in any other orthonormal basis {g',} is described by the 
matrix equation ¢' = Ue, where the transformation matrix U is unitary and has ele- 
ments Uy, = (~|0u). A transformation between orthonormal bases is called a unitary 
transformation. 


Equation (5.69) is a direct generalization of the ordinary 2-D vector rotational transfor- 
mation equation, Eq. (3.26), 


A’=SA. 


For further emphasis, we compare the transformation matrix U introduced here (at right, 
below) with the matrix S (at left) from Eq. (3.28), for rotations in ordinary 2-D space: 


ee be (g191) (gi |¢2) 
S= ad U=] (g5l1) (¢51g2) 
©, 1 €, - €2 ie eas 


The resemblance becomes even more striking if we recognize that in Dirac notation, the 
quantities é. -@; assume the form (é|é;). 

As for ordinary vectors (except that the quantities involved here are complex), the ith 
row of U contains the (complex conjugated) components (a.k.a. coefficients) of y; in terms 
of the unprimed basis; the orthonormality of the primed ¢ is consistent with the fact that 
UUt is a unit matrix. The columns of U contain the components of the yj in terms of the 
primed basis; that also is analogous to our earlier observations. The matrix S is orthogonal; 
U is unitary, which is the generalization to a complex space of the orthogonality condition. 

Summarizing, we see that unitary transformations are analogous, in vector spaces, to the 
orthogonal transformations that describe rotations (or reflections) in ordinary space. 
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Example 5.5.7 A UNITARY TRANSFORMATION 


A Hilbert space is spanned by five functions defined on the surface of a unit sphere and 
expressed in spherical polar coordinates 0, : 


/ 15 115 
=,/ — sind 6 cosg, = ,/ — sin0@cos@ sing, 
xX = sin 8 cos Q X2 az y 
115 | 15 
a= ae sin’ 0 SINGCOSY, X= Ta sin’ 6(cos” g- sin? ~), 
5 
X5= ae (3cos” 6 — 1). 


These are orthonormal when the scalar product is defined as 


20 


(fle) = [shot (aereosep: 


0 


This Hilbert space can alternatively be spanned by the orthonormal set of functions 


15 ; 15 : 
MA es sin6 cos 6 e'?, HS alae sin@ cosbe '®, 

15 15 
b= ee sto, X4= y 350 sin’ 0 e~7?, 


X5 = Xs: 


The matrix U describing the transformation from the unprimed to the primed basis has 
elements uy, = (X/,1X u) Working out a representative matrix element, 


of 


422 = (X3/X2) = aig [snes 
0 


dg sin’ 6 cos” 6 e*'? sing 


oes 


20 


us * F 

15 ev —e ie 
= [sm ocos?oae f dpe” —S*— 
An /2 ‘ 2i 








a 15 (=) —2n i 
~ AnJ/2 15 2i ~ fa 
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We obtained this result by using the formula i e"? dp = 2m 8,9 and by looking up a 
tabulated value for the 0 integral. We evaluate explicitly one more matrix element: 
15 0 2 io te 
/ 3 2 +igp C7 Te” 
U2] = — sin” 6 cos ado | et? —____ 
(X21X1) tn Ji ”) 5 
0 0 
i ( 4 ) 2x 
— AnJ/2 \IS) 2 V2 
Evaluating the remaining elements of U, we reach 
=1f/2. =t//2 0 0 0 
V//2 -i/V/2 0 0 0 
US 0 0 i/v2 1/V¥2 0 
0 0 -i//2 1/V2 0 
0 0 0 0 1 
As a check, note that the ith column of U should yield the components of x, in the primed 
basis. For the first column, we have 
[15 1 [15 1 [15 . 
— sin@ cos@ cosy = ——= { —,/ — sin@ cose’? | + — — sinOcosde '? }, 
4a J/2 87 J2\V 8x 
which simplifies easily to an identity. Further checks are left as Exercise 5.5.1. a 
Successive Transformations 
It is possible to make two or more successive unitary transformations, each of which will 
convert an input orthonormal basis to an output basis that is also orthonormal. Just as for 
ordinary vectors, the successive transformations are applied in right-to-left order, and the 
product of the transformations can be viewed as a resultant unitary transformation. 
Exercises 
5.5.1 Show that the matrix U of Example 5.5.1 correctly transforms the vector f(0, p) = 
3x1 + 21x, — X3 + Xs to the {x/} basis by 
(a) (1) Making a column vector c that represents f (6, y) in the {x;} basis, 
(2) forming e’ = Uc, and 
(3) comparing the expansion )°; c; x/(6, y) with f(0, g); 
(b) Verifying that U is unitary. 
5.5.2 (a) Given (in R}) the basis gy} = x, g2 = y, 93 = z, consider the basis transformation 


x—>Z, y—> y, ~— —x. Find the 3 x 3 matrix U for this transformation. 








5.5.3 


5.5.4 


5.5.5 
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(b) This transformation corresponds to a rotation of the coordinate axes. Identify 
the rotation and reconcile your transformation matrix with an appropriate matrix 
S(a, B, y) of the form given in Eq. (3.37). 


(c) Form the column vector ¢ representing (in the original basis) f = 2x — 3y + z, 
find the result of applying U to c, and show that this is consistent with the basis 
transformation of part (a). 


Note. You do not need to be able to form scalar products to handle this exercise; a 
knowledge of the linear relationship between the original and transformed functions is 
sufficient. 


Construct the matrix representing the inverse of the transformation in Exercise 5.5.2, 
and show that this matrix and the transformation matrix of that exercise are matrix 
inverses of each other. 


The unitary transformation U that converts an orthonormal basis {g;} into the basis {y;} 
and the unitary transformation V that converts the basis {y;} into the basis {x;} have 
matrix representations 


isin@ cos@ O 1 0 0 
U=|-—cos@ isinOd O}], V={0 cos@' isin@ 
0 0 1 0 cos@ —isin@ 


Given the function f(x) = 3g, (x) — g2(x) — 293(x), 


(a) By applying U, form the vector representing f(x) in the {y;} basis and then by 
applying V form the vector representing f(x) in the {x,} basis. Use this result to 
write f(x) as a linear combination of the x,. 


(b) Form the matrix products UV and VU and then apply each to the vector represent- 
ing f(x) in the {g;} basis. Verify that the results of these applications differ and 
that only one of them gives the result corresponding to part (a). 


Three functions which are orthogonal with unit weight on the range —1 < x < 1 are 
Po =1, P} =x, and P, = 3x? _ 5. Another set of functions that are orthogonal and 
span the same space are Fo = x2, Fy =x, Fy =5x2-3. Although much of this exercise 
can be done by inspection, write down and evaluate all the integrals that lead to the 
results when they are obtained in terms of scalar products. 





(a) Normalize each of the P; and F;. 


(b) Find the unitary matrix U that transforms from the normalized P; basis to the 
normalized F; basis. 


(c) Find the unitary matrix V that transforms from the normalized F; basis to the 
normalized P; basis. 


(d) Show that U and V are unitary, and that V= U7!. 


(e) Expand f(x) =5x* — 3x + 1 in terms of the normalized versions of both bases, 
and verify that the transformation matrix U converts the P-basis expansion of 
f (x) into its F-basis expansion. 
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5.6 TRANSFORMATIONS OF OPERATORS 


We have seen how unitary transformations can be used to transform the expansion of a 
function from one orthonormal basis set to another. We now consider the corresponding 
transformation for operators. Given an operator A, which when expanded in the ¢ basis 
has the form 


A=) - |Gu)auv (Pol, 
pv 


we convert it to the g’ basis by the simple expedient of inserting resolutions of the identity 
(written in terms of the primed basis) on both sides of the above expression. This is an 
excellent example of the benefits of using Dirac notation. Remembering that this does not 
change A (but of course does change its appearance), we get 


A= Do 1) (G5 lPu)auv(Pulot) (et, 


[LvoT 


which we simplify by identifying (9 |g.) = Uo, as defined in Eq. (5.67), and (gy |g,) = 
u*,. Thus, 


A= V5 1G )Mo nau, (Gel = D0 195.452 (Ge, (5.74) 
[LVvoT oT 
where a‘, is the ot matrix element of A in the primed basis, related to the unprimed 
values by 


05, =) Mopduvitt,. (5.75) 
pv 


If we now note that u*,, = (U'),,, we can write Eq. (5.75) as the matrix equation 
A’ =UAU! =UAU!, (5.76) 


where in the final member of the equation we used the fact that U is unitary. 

Another way of getting at Eq. (5.76) is to consider the operator equation Ay = x, where 
initially A, w, and x are all regarded as expanded in the orthonormal set y, with A having 
matrix elements a,,,, and with w and x having the forms W = >, c.g, and x = Y°, bug. 
This state of affairs corresponds to the matrix equation 


Ac=b. 


Now we simply insert U~!U between A and ¢, and multiply both sides of the equation on 
the left by U. The result is 


(uau"!) (Uc) =Ub —> Ad=Db’, (5.77) 


showing that the operator and the functions are properly related when the functions have 
been transformed by applying U and the operator has been transformed as required by 
Eq. (5.76). Since this relationship is valid for any choice of ¢ and U, it confirms the trans- 
formation equation for A. 
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Nonunitary Transformations 


It is possible to consider transformations similar to that illustrated by Eq. (5.77), but using 
a transformation matrix G that must be nonsingular, but is not required to be unitary. Such 
more general transformations occasionally appear in physics applications, are called simi- 
larity transformations, and lead to an equation deceptively similar to Eq. (5.77): 


(cac"') (ce) —Gb. (5.78) 


There is one important difference: Although a general similarity transformation preserves 
the original operator equation, corresponding items do not describe the same quantity in a 
different basis. Instead, they describe quantities that have been systematically (but consis- 
tently) altered by the transformation. 

Sometimes we encounter a need for transformations that are not even similarity trans- 
formations. For example, we may have an operator whose matrix elements are given in a 
nonorthogonal basis, and we consider the transformation to an orthonormal basis generated 
by use of the Gram-Schmidt procedure. 


Example 5.6.1 —— GRAM-SCHMIDT TRANSFORMATION 


The Gram-Schmidt process describes the transformation from an initial function set x, to 
an orthonormal set g,, according to equations that can be brought to the form 


LL 
Gi= >. anh peo=1;2,0053 
i=1 


Because the Gram-Schmidt process only generates coefficients t;,, with i < jz, the transfor- 
mation matrix T can be described as upper triangular, i.e., a square matrix with nonzero 
elements ¢;,, only on and above its principal diagonal. Defining S as a matrix with elements 
Sig = (XX p (often called an overlap matrix), the orthonormality of the g,, is evidenced 
by the equation 


(gules) = > hurglinxs) =>" fe lg ite= O'S) = 8 (5.79) 

ij ij 
Note that because T is upper triangular, T’ must be lower triangular. In writing Eq. (5.79) 
we did not have to restrict the i and j summations, as the coefficients outside the contribut- 


ing ranges of i and j are present, but set to zero. 
From Eq. (5.79) we can obtain a representation of S: 


says (5.80) 


Moreover, if we replace S from Eq. (5.79) by the matrix of a general operator A (in the x; 
basis), we find that in the orthonormal ¢ basis its representation A’ is 


A’ =TIAT. (5.81) 


In general, T’ will not be equal to T~!, so this equation does not define a similarity trans- 
formation. a 
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Exercises 


5.6.1 


5.6.2 


5.6.3 


(a) Using the two spin functions g; = a and g2 = 6 as an orthonormal basis (so 
(ala) = (B|B) = 1, (a|B) = 0), and the relations 


1 1 1. 1. 1 1 
Sa = 58: S,p= Oh Syar = 5/8: SyB _ —3he Sza = ohn S,B = — 58: 


construct the 2 x 2 matrices of S;, Sy, and S;. 
(b) Taking now the basis y| = C(a + B), v5 =C(a — B): 
(i) Verify that yg} and gy’ are orthogonal, 
(ii) Assign C a value that makes gy; and ¢/, normalized, 
(iii) Find the unitary matrix for the transformation {gy} — {g'}. 
(c) Find the matrices of S,, S,, and S, in the {g‘} basis. 
For the basis 9} = Cxe~", g =Cye~", 93 = Cze~", where r? = x? + y? + 22, 


with the scalar product defined as an unweighted integral over IR? and with C chosen 
to make the g; normalized: 


a 0 
(a) Find the 3 x 3 matrix of Ly = —i (> oe =) ; 


az oy 
1 0 0 
(b) Using the transformation matrix U=]0 1//2 —i//2], find the trans- 
0 1/v2 i/v2 


formed matrix of L,; 


(c) Find the new basis functions ¢; defined by the transformation U, and write explic- 
itly (in terms of x, y, and z) the functional forms of Lx, b=1,-2,,3. 


2 2 3 
Hint. Use fe" Br=n/?, f x2e-" d?r = 527/*; the integrals are over R?. 


The Gram-Schmidt process for converting an arbitrary basis x,, into an orthonormal 
set gy is described in Section 5.2 in a way that introduces coefficients of the form 
—(9u|X,). For bases consisting of three functions, convert the formulation so that g, 
is expressed entirely in terms of the x,,, thereby obtaining an expression for the upper- 
triangular matrix T appearing in Eq. (5.81). 


5.7 INVARIANTS 


Just as coordinate rotations leave invariant the essential properties of physical vectors, 
we can expect unitary transformations to preserve essential features of our vector spaces. 
These invariances are most directly observed in the basis-set expansions of operators 
and functions. 


Consider first a matrix equation of the form b = Ac, where all quantities have been 


evaluated using a particular orthonormal basis y;. Now suppose that we wish to use a 
basis x, which can be reached from the original basis by applying a unitary transformation 
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such that 
e’=Uc and b’=Ub. 


In the new basis, the matrix A becomes A’ = UAU™!, and the invariance we seek corre- 
sponds to b’ = A’c’. In other words, all the quantities must change coherently so that their 
relationship is unaltered. It is easy to verify that this is the case. Substituting for the primed 
quantities, 
Ub=(UAU~!)(Uc) —> Ub=UAe, 

from which we can recover b = Ac by multiplying from the left by U~!. 

Scalar quantities should remain invariant under unitary transformation; the prime 
example here is the scalar product. If f and g are represented in some orthonormal basis, 
respectively, by a and b, their scalar product is given by a'b. Under a unitary transfor- 


mation whose matrix representation is U, a becomes a’ = Ua and b becomes b’ = Ub, 
and 


(f|g) = (a’)'b! = (Ua)! (Ub) = (a'U") (Ub) =a’. (5.82) 


The fact that U' = U~! enables us to confirm the invariance. 
Another scalar that should remain invariant under basis transformation is the expectation 
value of an operator. 


Example 3. 7: 1 EXPECTATION VALUE IN TRANSFORMED BASIS 


Suppose that y = > cigij, and that we wish to compute the expectation value of A for 


i 
this y, where A, the matrix corresponding to A, has elements ay, = (9) |Al@yz). We have 
(A) = (WlAly) —> cfAc. 
If we now choose to use a basis obtained from the g; by a unitary transformation U, the 
expression for (A) becomes 
(Ue)' (UAU~!)(Ue) = eT UTUAU Ue, 

which, because U is unitary and therefore U' = U~!, reduces, as it must, to the previously 
obtained value of (A). a 


Vector spaces have additional useful matrix invariants. The trace of a matrix is invariant 
under unitary transformation. If A’ = UAU™!, then 


trace(A’) = SS OAU iy = ye Uypwapr Oe = ie (Udon) aut 


VULT [LT v 


v 
= Sindee = > ages trace), (5.83) 
MT ML 


Here we simply used the property U~!U = 1. 

Another matrix invariant is the determinant. From the determinant product theorem, 
det(UAU~!) = det(U~!UA) = det(A). Further invariants will be identified when we study 
matrix eigenvalue problems in Chapter 6. 
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Exercises 


5.7.1 


5.7.2 


Using the formal properties of unitary transformations, show that the commutator 
[x, p] =/ is invariant under unitary transformation of the matrices representing x and p. 


The Pauli matrices 


_(0 1 (0 =i fi 0 
SL 14, Oy MPVs og Je PS ge aa Je 


have commutator [0 1, 02] = 2i03. Show that this relationship continues to be valid if 
these matrices are transformed by 


cos@ sing 
U= : 
—sin@ cosé 
(a) The operator L, is defined as 


é ; a a 
=-i ——Z——}. 
* az * Oy 


Verify that the basis yg; = Cxe~", g2 = Cye"", 3 = Cze~", where r? = x7 + 
y? + 22, forms a closed set under the operation of L,, meaning that when L, is 
applied to any member of this basis the result is a function within the basis space, 
and construct the 3 x 3 matrix of L,, in this basis from the result of the application 
of L, to each basis function. 


(b) Verify that L,. [« + ive” | = ge”, and note that this result, using the {g;} 
basis, can be written L,.(g1 +ig2) = —@3. 


(c) Express the equation of part (b) in matrix form, and write the matrix equation that 
results when each of the quantities is transformed using the transformation matrix 


i- 2 0 
U=|0 1//2 -i/V2]. 
0 1/v2 i/v2 


(d) Regarding the transformation U as producing a new basis {9;}, find the explicit 
form (in x, y, z) of the g;. 


(ec) Using the operator form of L, and the explicit forms of the g/, verify the validity 
of the transformed equation found in part (c). 
Hint. The results of Exercise 5.6.2 may be useful. 


5.8 SUMMARY—VECTOR SPACE NOTATION 


It may be useful to summarize some of the relationships found in this chapter, highlighting 
the essentially complete mathematical parallelism between the properties of vectors and 
those of basis expansions in vector spaces. We do so here, using Dirac notation wherever 
appropriate. 
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1. Scalar product: 
b 
(glv) = [ eovowe dt => (ulv) =u'v=u"-v. (5.84) 


a 





The result of the scalar product operation is a scalar (i.e., a real or complex number). 
Here u'y represents the product of a row and a column vector; it is equivalent to the 
dot-product notation also shown. 

2. Expectation value: 


b 
(vlAlg) = / MOAR daa Re (5.85) 
3. Adjoint: 
(p|Aly) = (AT gly) => (ulAly) = (Aulv) = [ATu]'v = ulAv. (5.86) 


Note that the simplification of [A‘u]'v shows that the matrix A‘ has the property 
expected of an operator adjoint. 
4. Unitary transformation: 


wv = Ay — Uy = (UAU!) (Ug) —> w= Av — Uw = (UAU™!)(Uy). (5.87) 


5. Resolution of identity: 
1= J" |gi)(gil 3 1= "18 il, (5.88) 
i 


i 
where the g; are orthonormal and the é; are orthogonal unit vectors. Applying 
Eq. (5.88) to a function (or vector): 


v= >- gigi) =>“ aigi w= > |e) Glw) =) wis, (5.89) 
where a; = (y;|w) and w; = (€;|w) = @; - w. 
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CHAPTER 6 


EIGENVALUE PROBLEMS 


6.1 EIGENVALUE EQUATIONS 


Many important problems in physics can be cast as equations of the generic form 
Aw = dy, (6.1) 


where A is a linear operator whose domain and range is a Hilbert space, yw is a function in 
the space, and A is a constant. The operator A is known, but both w and 4 are unknown, 
and the task at hand is to solve Eq. (6.1). Because the solutions to an equation of this 
type yield functions y that are unchanged by the operator (except for multiplication by a 
scale factor A), they are termed eigenvalue equations: Eigen is German for “[its] own.” A 
function y that solves an eigenvalue equation is called an eigenfunction, and the value of 
A that goes with an eigenfunction is called an eigenvalue. 

The formal definition of an eigenvalue equation may not make its essential content 
totally apparent. The requirement that the operator A leaves y unchanged except for a 
scale factor constitutes a severe restriction upon y. The possibility that Eq. (6.1) has any 
solutions at all is in many cases not intuitively obvious. 

To see why eigenvalue equations are common in physics, let’s cite a few examples: 


1. The resonant standing waves of a vibrating string will be those in which the restor- 
ing force on the elements of the string (represented by Ay) are proportional to their 
displacements w from equilibrium. 

2. The angular momentum L and the angular velocity w of a rigid body are three- 
dimensional (3-D) vectors that are related by the equation 


L=le, 


where | is the 3 x 3 moment of inertia matrix. Here the direction of w defines the axis 
of rotation, while the direction of L defines the axis about which angular momentum is 
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generated. The condition that these two axes be in the same direction (thereby defin- 
ing what are known as the principal axes of inertia) is that L = Aw, where A is a 
proportionality constant. Combining with the formula for L, we obtain 


lo = ho, 


which is an eigenvalue equation in which the operator is the matrix | and the eigen- 
function (then usually called an eigenvector) is the vector w. 

3. The time-independent Schrédinger equation in quantum mechanics is an eigenvalue 
equation, with A the Hamiltonian operator H, wy a wave function and A = E the 
energy of the state represented by w. 


Basis Expansions 


A powerful approach to eigenvalue problems is to express them in terms of an orthonormal 
basis whose members we designate ¢;, using the formulas developed in Chapter 5. Then 
the operator A and the function w are represented by a matrix A and a vector ¢ whose 
elements are obtained, according to Eqs. (5.51) and (5.52), as the scalar products 


aij = (GilAl@j), ci = (Gil). 
Our original eigenvalue equation has now been reduced to a matrix equation: 
Ac = he. (6.2) 


When an eigenvalue equation is presented in this form, we can call it a matrix eigenvalue 
equation and call the vectors ¢ that solve it eigenvectors. As we shall see in later sections 
of this chapter, there is a well-developed technology for the solution of matrix eigenvalue 
equations, so a route always available for solving eigenvalue equations is to cast them 
in matrix form. Once a matrix eigenvalue problem has been solved, we can recover the 
eigenfunctions of the original problem from their expansion: 


v=) cgi. 


Sometimes, as in the moment of inertia example mentioned above, our eigenvalue prob- 
lem originates as a matrix problem. Then, of course, we do not have to begin its solution 
process by introducing a basis and converting it into matrix form, and our solutions will be 
vectors that do not need to be interpreted as expansions in a basis. 


Equivalence of Operator and Matrix Forms 


It is important to note that we are dealing with eigenvalue equations in which the operator 
involved is linear and that it operates on elements of a Hilbert space. Once these conditions 
are met, the operator and function involved can always be expanded in a basis, leading to 
a matrix eigenvalue equation that is totally equivalent to our original problem. Among 
other things, this means that any theorems about properties of eigenvectors or eigenvalues 
that are developed from basis-set expansions of an eigenvalue problem must apply also to 
the original problem, and that solution of the matrix eigenvalue equation provides also a 
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solution to the original problem. These facts, plus the practical observation that we know 
how to solve matrix eigenvalue problems, strongly suggest that the detailed investigation 
of the matrix problems should be on our agenda. 

When we explore matrix eigenvalue problems, we will find that certain properties of the 
matrix influence the nature of the solutions, and that in particular significant simplifications 
become available when the matrix is Hermitian. Many eigenvalue equations of interest in 
physics involve differential operators, so it is of importance to understand whether (or 
under what conditions) these operators are Hermitian. That issue is taken up in Chapter 8. 

Finally, we note that the introduction of a basis-set expansion is not the only possibility 
for solving an eigenvalue equation. Eigenvalue equations involving differential operators 
can also be approached by the general methods for solving differential equations. That 
topic is also discussed in Chapter 8. 


6.2 MATRIX EIGENVALUE PROBLEMS 


While in principle the notion of an eigenvalue problem is already fully defined, we open 
this section with a simple example that may help to make it clearer how such problems are 
set up and solved. 


A Preliminary Example 


We consider here a simple problem of two-dimensional (2-D) motion in which a particle 
slides frictionlessly in an ellipsoidal basin (see Fig. 6.1). If we release the particle (initially 
at rest) at an arbitrary point in the basin, it will start to move downhill in the (negative) 
gradient direction, which in general will not aim directly at the potential minimum at the 
bottom of the basin. The particle’s overall trajectory will then be a complicated path, as 
sketched in the bottom panel of Fig. 6.1. Our objective is to find the positions, if any, 
from which the trajectories will aim at the potential minimum, and will therefore represent 
simple one-dimensional oscillatory motion. 

This problem is sufficiently elementary that we can analyze it without great difficulty. 
We take a potential of the form 


Va,y= ax? + bxy+ cy’, 


with parameters a, b, c in ranges that describe an ellipsoidal basin with a minimum in V 
at x = y=0. We then calculate the x and y components of the force on our particle when 
at (x, y): 

aV V 

Fy =—-—=-2ax—by, Fy=—-—=—bx —2cy. 

Ox , dy 
It is pretty clear that, for most values of x and y, Fy/Fy #x/y, so the force will not be 
directed toward the minimum at x = y = 0. 

To search for directions in which the force is directed toward x = y = 0, we begin by 

writing the equations for the force in matrix form: 


Fy\  (—2a —b x _ 
(B)-(F )G) a tH 
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FIGURE 6.1 Top: Contour lines of basin potential V = x? — /5xy + 3y*. Bottom: 
Trajectory of sliding particle of unit mass starting from rest at (8.0, —1.92). 


where f, H, and r are defined as indicated. Now the condition Fy / Fy = x/y is equivalent 
to the statement that f and r are proportional, and therefore we can write 


Hr = Ar, (6.3) 


where, as already suggested, H is a known matrix, while A and r are to be determined. This 
is an eigenvalue equation, and the column vectors r that are its solutions are its eigenvec- 
tors, while the corresponding values of A are its eigenvalues. 

Equation (6.3) is a homogeneous linear equation system, as becomes more obvious if 
written as 


(H—Al)r=0, (6.4) 


and we know from Chapter 2 that it will have the unique solution r = 0 unless det(H — 
\1) = 0. However, the value of i is at our disposal, so we can search for values of A that 
cause this determinant to vanish. Proceeding symbolically, we look for 4 such that 


hy—A hy 


det(H- =] ana 


=0. 


Expanding the determinant, which is sometimes called a secular determinant (the name 
arising from early applications in celestial mechanics), we have an algebraic equation, the 
secular equation, 


(hy — A)(ho2 — A) — hy2h21 =0, (6.5) 
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which can be solved for 4. The left hand side of Eq. (6.5) is also called the characteristic 
polynomial (in 4) of H, and Eq. (6.5) is for that reason also known as the characteristic 
equation of H. 

Once a value of 4 that solves Eq. (6.5) has been obtained, we can return to the homo- 
geneous equation system, Eq. (6.4), and solve it for the vector r. This can be repeated for 
all 2 that are solutions to the secular equation, thereby giving a set of eigenvalues and the 
associated eigenvectors. 


Example 6.2.1  — 2-D ELuipsoiDaL BASIN 


Let’s continue with our ellipsoidal basin example, with the specific parameter values a = 1, 
b = —/5, c = 3. Then our matrix H has the form 


and the secular equation takes the form 


“f-3. a/5 


det(H — A1) = =)? +81+7=0. 
id | J, <6= i tae 
Since A? + 8A +7 = (A+ 1)(A+7), we see that the secular equation has as solutions the 
eigenvalues A = —1 andA = —7. 
To get the eigenvector corresponding to 4 = —1, we return to Eq. (6.4), which, written 


in great detail, is 


waar (FI Mol )=G S)G)=° 


which expands into a linearly dependent pair of equations: 
-x+ 75 y=0 
J5x —S5y=0. 


This is, of course, the intention associated with the secular equation, because if these equa- 
tions were linearly independent they would inexorably lead to the solution x = y = 0. 
Instead, from either equation, we have x = V5 y, so we have the eigenvalue/eigenvector 
pair 

A,=-l, nse) ) 
where C is a constant that can assume any value. Thus, there is an infinite number of x, y 
pairs that define a direction in the 2-D space, with the magnitude of the displacement in 
that direction arbitrary. The arbitrariness of scale is a natural consequence of the fact that 
the equation system was homogeneous; any multiple of a solution of a linear homogeneous 
equation set will also be a solution. This eigenvector corresponds to trajectories that start 
from the particle at rest anywhere on the line defined by r;. A trajectory of this sort is 
illustrated in the top panel of Fig. 6.2. 
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FIGURE 6.2 Trajectories starting at rest. Top: At a point on the line x = yJ/5. 
Bottom: At a point on the line y = —xJ/5. 


We have not yet considered the possibility that 4 = —7. This leads to a different eigen- 
vector, obtained by solving 


mame (%A” EOC Y= 


corresponding to y = —x 5. This defines the eigenvalue/eigenvector pair 


a — ft = 
M2=-7, W=C (75): 


A trajectory of this sort is shown in the bottom panel of Fig. 6.2. 

We thus have two directions in which the force is directed toward the minimum, and 
they are mutually perpendicular: the first direction has dy/dx = 1//5; for the second, 
dy/dx =—/5. 

We can easily check our eigenvectors and eigenvalues. For A; and r,, 


It is often useful to normalize eigenvectors, which we can do by choosing the constant 
(C or C’) to make r of magnitude unity. In the present example, 


_ (V5/6 _ [-vi/6 rv 
| ght = sre } (6.6) 
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Each of these normalized eigenvectors is still arbitrary as to overall sign (or if we accept 
complex coefficients, as to an arbitrary complex factor of magnitude unity). 

Before leaving this example, we make three further observations: (1) the number of 
eigenvalues was equal to the dimension of the matrix H. This is a consequence of the 
fundamental theorem of algebra, namely that an equation of degree n will have n roots; 
(2) although the secular equation was of degree 2 and quadratic equations can have com- 
plex roots, our eigenvalues were real; and (3) our two eigenvectors are orthogonal. | 


Our 2-D example is easily understood physically. The directions in which the displace- 
ment and the force are collinear are the symmetry directions of the elliptical potential field, 
and they are associated with different eigenvalues (the proportionality constant between 
position and force) because the ellipses have axes of different lengths. We have, in fact, 
identified the principal axes of our basin. With the parameters of Example 6.2.1, the poten- 
tial could have been written (using the normalized eigenvectors) 


1 (“) 7 (= 


me) 





V6 2\ V6 


which shows that V divides into two quadratic terms, each dependent on a parenthesized 
quantity (a new coordinate) proportional to one of our eigenvectors. The new coordinates 
are related to the original x, y by a rotation with unitary transformation U: 


v= (Se a6) ()= (Cannas) = G) 


Finally, we note that when we calculate the force in the primed coordinate system, we get 


Fy = —x', Fy = —Ty’, 


2 
=the? +i’, 


corresponding to the eigenvalues we found. 


Another Eigenproblem 


Example 6.2.1 is not complicated enough to provide a full illustration of the matrix eigen- 
value problem. Consider next the following example. 


Example 6.2.2 — BLOCK-DIAGONAL MATRIX 


Find the eigenvalues and eigenvectors of 


0 1 0 
H=]1 0 O}. (6.7) 
0 0 2 
Writing the secular equation and expanding in minors using the third row, we have 
—h 1 0 
—h 1 2 
1 -A 0 |=@-A) i co = (2—-A)(A* — 1) =0. (6.8) 
0 O 2-A 


We see that the eigenvalues are 2, +1, and —1. 
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To obtain the eigenvector corresponding to 4 = 2, we examine the equation set 
[H — 2(1)]e =0: 


—2c) +c2 =0, 
cy — 2c2 = 0, 
0=0. 


The first two equations of this set lead to c) = cz = 0. The third obviously conveys no 
information, and we are led to the conclusion that c3 is arbitrary. Thus, at this point we 
have 


0 
Ay=2, cy=]| 0 ]. (6.9) 
C 


Taking next A = +1, our matrix equation is [H — 1(1)]e¢ = 0, which is equivalent to the 
ordinary equations 


—cy +c, =0, 
cy —c2 =0, 
c3 = 0. 


We clearly have cj = cz and c3 = 0, so 


Cc 
A2=4+1, O=I1C (6.10) 
0 
Similar operations for A = —1 yield 
Cc 
A3=—-l1, c=] —-C ]. (6.11) 
0 


Collecting our results, and normalizing the eigenvectors (often useful, but not in general 
necessary), we have 


0 9-1/2 2-1/2 
A,=2, ey =] OF, do=1, m=] 27! |, 3=-1, @B= 2-1/2 
1 0 0 


Note that because H was block-diagonal, with an upper-left 2 x 2 block and a lower- 
right 1 x 1 block, the secular equation separated into a product of the determinants for the 
two blocks, and its solutions corresponded to those of an individual block, with coeffi- 
cients of value zero for the other block(s). Thus, 4 = 2 was a solution for the 1 x 1 block 
in row/column 3, and its eigenvector involved only the coefficient c3. The A values +1 
came from the 2 x 2 block in rows/columns | and 2, with eigenvectors involving only 
coefficients cy and cp. | 





In the case of a 1 x 1 block in row/column i, we saw, for i = 3 in Example 6.2.2, that its 
only element was the eigenvalue, and that the corresponding eigenvector is proportional 
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to é; (a unit vector whose only nonzero element is c; = 1). A generalization of this obser- 


vation is that if a matrix H is diagonal, its diagonal elements h;; will be the eigenvalues A;, 
and that the eigenvectors ¢; will be the unit vectors é;. 


Degeneracy 


If the secular equation has a multiple root, the eigensystem is said to be degenerate or to 
exhibit degeneracy. Here is an example. 


Example 6.2.3 DEGENERATE EIGENPROBLEM 


Let’s find the eigenvalues and eigenvectors of 


00 1 
H={/0 1 0 (6.12) 
1 0 0 
The secular equation for this problem is 
Ax 0 1 
0 1-A 0} =1700 — aA) -— (1 -A) =? - DAA =0 (6.13) 
1 0 —Ax 
with the three roots +1, +1, and —1. Let’s consider first A = —1. Then we have 
cy +03 =0, 
2c2 = 0, 
cy tc =0. 
Thus, 
1 
Ay=—-l, ec =C 0 |. (6.14) 
-1 
For the double root 7 = +1, 
—c1 +c3 =0, 
0=0, 
cy —c33 = 0. 


Note that of the three equations, only one is now linearly independent; the double root sig- 
nals two linear dependencies, and we have solutions for any values of c, and c2, with only 
the condition that c3 = c,. The eigenvectors for 1 = +1 thus span a 2-D manifold (= sub- 
space), in contrast to the trivial one-dimensional manifold characteristic of nondegenerate 
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solutions. The general form for these eigenvectors is 


C 
A=H+l, c= [Cc’']. (6.15) 
C 


It is convenient to describe the degenerate eigenspace for A = 1 by identifying two mutu- 
ally orthogonal vectors that span it. We can pick the first vector by choosing arbitrary val- 
ues of C and C’ (an obvious choice is to set one of these, say C’, to zero). Then, using the 
Gram-Schmidt process (or in this case by simple inspection), we find a second eigenvector 
orthogonal to the first. Here, this leads to 


1 
ho=a3=4+1, O=ClO], G=C' {1}. (6.16) 
1 0 
Normalizing, our eigenvalues and eigenvectors become 
9-1/2 9-1/2 0 
Ay=-l, cqy= 0 » Ag=Az3=1, O= 0 », B=] 1 
_9-1/2 q-1/2 0 


The eigenvalue problems we have used as examples all led to secular equations with 
simple solutions; realistic applications frequently involve matrices of large dimension and 
secular equations of high degree. The solution of matrix eigenvalue problems has been 
an active field in numerical analysis and very sophisticated computer programs for this 
purpose are now available. Discussion of the details of such programs is outside the scope 
of this book, but the ability to use such programs should be part of the technology available 
to the working physicist. 


Exercises 
Find the eigenvalues and corresponding normalized eigenvectors of the matrices in 
Exercises 6.2.1 through 6.2.14. Orthogonalize any degenerate eigenvectors. 
101 
6.2.1 A=]0 1 0 
101 
ANS. 2%4=0, 1,2. 
1 2 0 
6.2.2 A=|V2 0 0 
0 O O 
ANS. 2~=-1,0,2. 
1 1 0 
6.2.3 A=]{1 0 1 
01 1 


ANS. X4=-1,1,2. 





6.2.4 


6.2.5 


6.2.6 


6.2.7 


6.2.8 


6.2.9 


6.2.10 


6.2.11 


6.2.12 


6.2.13 


1 V8 
A=|V8 1 
0 8 
100 
R= ho 4 
Oa & 
1 0 
A-|0 1 
0 2 
010 
Rea 
010 
200 
yee Wo 
Op 
ae ae 
A-|{1 01 
‘ie ae 
| 
7 es | 
= es | 
i) ah~ 4 
yee Gy es | 
ae ee 
5 02 
A-|0 1 0 
3 
a Ce 
A=|1 1 0 
000 


-1 
-1 
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ANS. 274=-3,1,5. 


ANS. 2X4=0,1,2. 


ANS. X7}=-1,1,2. 


ANS VLSI OAD. 


ANS. 24=0, 2,2. 


ANS. K=-—1,-1,2. 


ANS. X4=-1,2,2. 


ANS. 2~4=0,0, 3. 


ANS. 24=1,1,6. 


ANS. 24=0,0, 2. 
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5 0 V3 
62.14 A=|0 3 O 
V3 0 3 


ANS. 24=2, 3,6. 
6.2.15 Describe the geometric properties of the surface 
x? 4 Ixy +2y?4+2yzt2=1. 


How is it oriented in 3-D space? Is it a conic section? If so, which kind? 


6.3. HERMITIAN EIGENVALUE PROBLEMS 


All the illustrative problems we have thus far examined have turned out to have real eigen- 
values; this was also true of all the exercises at the end of Section 6.2. We also found, 
whenever we bothered to check, that the eigenvectors corresponding to different eigenval- 
ues were orthogonal. The purpose of the present section is to show that these properties 
are consequences of the fact that all the eigenvalue problems we have considered were for 
Hermitian matrices. 

We remind the reader that the check for Hermiticity is simple: We simply verify that H 
is equal to its adjoint, H'; if a matrix is real, this condition is simply that it be symmetric. 
All the matrices to which we referred are clearly Hermitian. 

We now proceed to characterize the eigenvalues and eigenvectors of Hermitian matri- 
ces. Let H be a Hermitian matrix, with ¢; and ¢; two of its eigenvectors corresponding, 
respectively, to the eigenvalues A; and A;. Then, using Dirac notation, 


Hle;) =Ajlei), Hle;) =Ajle;). (6.17) 

Multiplying on the left the first of these by c. which in Dirac notation is (¢;|, and the 
second by (c;|, 

(e;|Hle;) = Ai(ejle;), (er |H]ej) =A; (eile;). (6.18) 


We next take the complex conjugate of the second of these equations, noting that (¢;|¢;)* = 
(c;|¢;), that we must complex conjugate the occurrence of A;, and that 


(¢;|H|e;)* = (Hejle;) = (c;|Hle;). (6.19) 


Note that the first member of Eq. (6.19) contains the scalar product of ¢; with He;. Com- 
plex conjugating this scalar product yields the second member of that equation. The final 
member of the equation follows because H is Hermitian. 

The complex conjugation therefore converts Eqs. (6.18) into 


(ej|Hle;) =Ai(ejle;),  (eg|Hle;) = A5 (ej lei). (6.20) 


Equations (6.20) permit us to obtain two important results: First, if i = 7, the scalar product 
(c;|¢;) becomes (c;|¢;), which is an inherently positive quantity. This means that the two 
equations are only consistent if A; = A, meaning that 4; must be real. Thus, 


The eigenvalues of a Hermitian matrix are real. 
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Next, if i 4 j, combining the two equations of Eq. (6.20), and remembering that the 4; 
are real, 


(Ai — Aj) (ejlei) = 9, (6.21) 


so that either A; = A; or (¢;|¢;) = 0. This tells us that 


Eigenvectors of a Hermitian matrix corresponding to different eigenvalues are 
orthogonal. 


Note, however, that if A; = 4;, which will occur if i and j refer to two degenerate eigen- 
vectors, we know nothing about their orthogonality. In fact, in Example 6.2.3 we examined 
a pair of degenerate eigenvectors, noting that they spanned a two-dimensional manifold 
and were not required to be orthogonal. However, we also noted in that context that we 
could choose them to be orthogonal. Sometimes (as in Example 6.2.3), it is obvious how 
to choose orthogonal degenerate eigenvectors. When it is not obvious, we can start from 
any linearly independent set of degenerate eigenvectors and orthonormalize them by the 
Gram-Schmidt process. 

Since the total number of eigenvectors of a Hermitian matrix is equal to its dimension, 
and since (whether or not there is degeneracy) we can make from them an orthonormal set 
of eigenvectors, we have the following important result: 


It is possible to choose the eigenvectors of a Hermitian matrix in such a way that they 
form an orthonormal set that spans the space of the matrix basis. This situation is often 
referred to by the statement, “The eigenvectors of a Hermitian matrix form a complete 
set.’ This means that if the matrix is of order n, any vector of dimension n can be written 
as a linear combination of the orthonormal eigenvectors, with coefficients determined 
by the rules for orthogonal expansions. 


We close this section by reminding the reader that theorems which have been established 
for an arbitrary basis-set expansion of a Hermitian eigenvalue equation apply also to that 
eigenvalue equation in its original form. Therefore, this section has also shown that: 


If His a linear Hermitian operator on an arbitrary Hilbert space, 


1. The eigenvalues of H are real. 
. Eigenfunctions corresponding to different eigenvalues of H are orthogonal. 
3. It is possible to choose the eigenfunctions of H in a way such that they form a 
orthonormal basis for the Hilbert space. In general, the eigenfunctions of a Her- 
mitian operator form a complete set (i.e., a complete basis for the Hilbert space). 


6.4 HERMITIAN MATRIX DIAGONALIZATION 
In Section 6.2 we observed that if a matrix is diagonal, the diagonal elements are its eigen- 


values. This observation opens an alternative approach to the matrix eigenvalue problem. 
Given the matrix eigenvalue equation 


He = Ac, (6.22) 
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where H is a Hermitian matrix, consider what happens if we insert unity between H and ec, 
as follows, with U a unitary matrix, and then left-multiply the resulting equation by U: 


HU~'Ue=Ae —> UHU!(Uc) =A(Ue). (6.23) 


Equation (6.23) shows that our original eigenvalue equation has been converted into one 
in which H has been replaced by its unitary transformation (by U) and the eigenvector ce 
has also been transformed by U, but the value of A remains unchanged. We thus have the 
important result: 


The eigenvalues of a matrix remain unchanged when the matrix is subjected to a unitary 
transformation. 


Next, suppose that we choose U in such a way that the transformed matrix UHU~! is 
in the eigenvector basis. While we may or may not know how to construct this U, we 
know that such a unitary matrix exists because the eigenvectors form a complete orthog- 
onal set, and can be specified to be normalized. If we transform with the chosen U, the 
matrix UHU~! will be diagonal, with the eigenvalues as diagonal elements. Moreover, the 
eigenvector Uc of UHU~! corresponding to the eigenvalue A; = (UHU7!);; is é; (a column 
vector with all elements zero except for unity in the ith row). We may find the eigenvector 
¢; of Eq. (6.22) by solving the equation Ue; = é;, obtaining ¢; = U~!@;. 

These observations correspond to the following: 


For any Hermitian matrix H, there exists a unitary transformation U that will cause 
UHU~! to be diagonal, with the eigenvalues of H as its diagonal elements. 


This is an extremely important result. Another way of stating it is: 


A Hermitian matrix can be diagonalized by a unitary transformation, with its eigenval- 
ues as the diagonal elements. 


Looking next at the ith eigenvector U~!@;, we have 


Ovi: cep MOE: tee AU of 0 (Un) 
CO ay ae TU ae acy Uy (U-!) 9; 
aks ae ae tea o 1/= — . (6.24) 
(U7! )nt cee (U7!)ni tee ni 0 (Une 


We see that the columns of U~! are the eigenvectors of H, normalized because U~! is 
a unitary matrix. It is also clear from Eq. (6.24) that U~! is not entirely unique; if its 
columns are permuted, all that will happen is that the order of the eigenvectors are changed, 
with a corresponding permutation of the diagonal elements of the diagonal matrix UHU~!. 
Summarizing, 


If a unitary matrix U is such that, for a Hermitian matrix H, UHU7! is diagonal, the 
normalized eigenvector of H corresponding to the eigenvalue (UHU~!);; will be the 
ith column of ul, 


If H is not degenerate, U~! (and also U) will be unique except for a possible permutation 
of the columns of U~! (and a corresponding permutation of the rows of U). However, if H is 
degenerate (has a repeated eigenvalue), then the columns of U~! corresponding to the same 
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eigenvalue can be transformed among themselves, thereby giving additional flexibility to 
U and Uu!. 

Finally, calling on the fact that both the determinant and the trace of a matrix are 
unchanged when the matrix is subjected to a unitary transformation (shown in Section 5.7), 
we see that the determinant of a Hermitian matrix can be identified as the product of its 
eigenvalues, and its trace will be their sum. Apart from the individual eigenvalues them- 
selves, these are the most useful of the invariants that a matrix has with respect to unitary 
transformation. 

We illustrate the ideas thus far introduced in this section in the next example. 


Example 6.4.17 TRANSFORMING A MATRIX TO DIAGONAL FORM 


We return to the matrix H of Example 6.2.2: 


0 1 0 
H=]1 0 0 
0 0 2 


We note that it is Hermitian, so there exists a unitary transformation U that will diagonalize 
it. Since we already know the eigenvectors of H, we can use them to construct U. Noting 
that we need normalized eigenvectors, and consulting Eqs. (6.9) to (6.11), we have 


0 1/2 Ljs/2 
K=2, (Ol; AHl, (1/2 A=—1, | -i1fV/2}. 
1 0 0 
Combining these as columns into U~!, 
0 1//2 1/2 
UF=(0 2 =12 
1 0 0 
Since U = (U~!)", we easily form 
0 0 1 2 0 O 
U=|1/V/2 1//2 O| and uHU'=]0 1 O 
1/V2 -1//2 0 G 0 = 
The trace of H is 2, as is the sum of the eigenvalues; det(H) is —2, equal to 2x1x(-1). 


Finding a Diagonalizing Transformation 


As Example 6.4.1 shows, a knowledge of the eigenvectors of a Hermitian matrix H enables 
the direct construction of a unitary matrix U that transforms H into diagonal form. But we 
are interested in diagonalizing matrices for the purpose of finding their eigenvectors and 
eigenvalues, so the construction illustrated in Example 6.4.1 does not meet our present 
needs. Applied mathematicians (and even theoretical chemists!) have over many years 





314 





Chapter 6 Eigenvalue Problems 


given attention to numerical methods for diagonalizing matrices of order large enough that 
direct, exact solution of the secular equation is not possible, and computer programs for 
carrying out these methods have reached a high degree of sophistication and efficiency. 
In varying ways, such programs involve processes that approach diagonalization via suc- 
cessive approximations. That is to be expected, since explicit formulas for the solution of 
high-degree algebraic equations (including, of course, secular equations) do not exist. To 
give the reader a sense of the level that has been reached in matrix diagonalization technol- 
ogy, we identify a computation! that determined some of the eigenvalues and eigenvectors 
of a matrix whose dimension exceeded 10°. 

One of the older techniques for diagonalizing a matrix is due to Jacobi. It has now been 
supplanted by more efficient (but less transparent) methods, but we discuss it briefly here 
to illustrate the ideas involved. The essence of the Jacobi method is that if a Hermitian 
matrix H has a nonzero value of some off-diagonal h;; (and thus also h;;), a unitary trans- 
formation that alters only rows/columns 7 and j can reduce hj; and hj; to zero. While 
this transformation may cause other, previously zeroed elements to become nonzero, it can 
be shown that the resulting matrix is closer to being diagonal (meaning that the sum of 
the squared magnitudes of its off-diagonal elements has been reduced). One may therefore 
apply Jacobi-type transformations repeatedly to reduce individual off-diagonal elements to 
zero, continuing until there is no off-diagonal element larger than an acceptable tolerance. 
If one constructs the unitary matrix that is the product of the individual transformations, 
one obtains thereby the overall diagonalizing transformation. Alternatively, one can use 
the Jacobi method only for retrieval of the eigenvalues, after which the method presented 
previously can be used to obtain the eigenvectors. 


Simultaneous Diagonalization 


It is of interest to know whether two Hermitian matrices A and B can have a common set 
of eigenvectors. It turns out that this is possible if and only if they commute. The proof is 
simple if the eigenvectors of either A or B are nondegenerate. 

Assume that ¢; are a set of eigenvectors of both A and B with respective eigenvalues a; 
and b;. Then form, for any i, 


BAc; = Ba;¢; = bja;¢;, 
ABc; = Ab; ¢; = a,b; ¢;. 


These equations show that BAc; = ABe; for every c;. Since any vector v can be written as 
a linear combination of the ¢;, we find that (BA — AB)v = 0 for all v, which means that 
BA = AB. We have found that the existence of a common set of eigenvectors implies com- 
mutation. It remains to prove the converse, namely that commutation permits construction 
of a common set of eigenvectors. 

For the converse, we assume that A and B commute, that ¢; is an eigenvector of A with 
eigenvalue a;, and that this eigenvector of A is nondegenerate. Then we form 


ABc; = BAc;j = Ba;¢;,_ or A(Be;) = a;(Be;). 


1}. Olsen, P. Jorgensen, and J. Simons, Passing the one-billion limit in full configuration-interaction calculations, Chem. Phys. 
Lett. 169: 463 (1990). 





6.4 Hermitian Matrix Diagonalization 315 


This equation shows that Be; is also an eigenvector of A with eigenvalue a;. Since the 
eigenvector of A was assumed nondegenerate, Be; must be proportional to c;, meaning 
that ¢; is also an eigenvector of B. This completes the proof that if A and B commute, they 
have common eigenvectors. 

The proof of this theorem can be extended to include the case in which both opera- 
tors have degenerate eigenvectors. Including that extension, we summarize by stating the 
general result: 


Hermitian matrices have a complete set of eigenvectors in common if and only if they 
commute. 


It may happen that we have three matrices A, B, and C, and that [A, B] = 0 and [A, C] = 
0, but [B, C] 4 0. In that case, which is actually quite common in atomic physics, we 
have a choice. We can insist upon a set of ¢; that are simultaneous eigenvectors of A and 
B, in which case not all the c; can be eigenvectors of C, or we can have simultaneous 
eigenvectors of A and C, but not B. In atomic physics these choices typically correspond 
to descriptions in which different angular momenta are required to have definite values. 


Spectral Decomposition 


Once the eigenvalues and eigenvectors of a Hermitian matrix H have been found, we 
can express H in terms of these quantities. Since mathematicians call the set of eigen- 
values of H its spectrum, the expression we now derive for H is referred to as its spectral 
decomposition. 

As previously noted, in the orthonormal eigenvector basis the matrix H will be diagonal. 
Then, instead of the general form for the basis expansion of an operator, H will be of the 
diagonal form 


H= > lCu)Au(Cul, each e, satisfies He, =A,e, and (e,|¢e,) = 1. (6.25) 
7 


This result, the spectral decomposition of H, is easily checked by applying it to any eigen- 
vector ¢). 

Another result related to the spectral decomposition of H can be obtained if we multiply 
both sides of the equation He,, = A,,¢,, on the left by H, reaching 


He, = Ouy ta 
further applications of H show that all positive powers of H have the same eigenvectors 


as H, so if f(H) is any function of H that has a power-series expansion, it has spectral 
decomposition 


FW) =o ley) FApudeul- (6.26) 
00 


Equation (6.26) can be extended to include negative powers if H is nonsingular; to do so, 


multiply He,, = A,,¢,, on the left by H~! and rearrange, to obtain 
mn uly y g 
H-!e,, = 2 c 
bh hu Mo 


showing that negative powers of H also have the same eigenvectors as H. 
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Finally, we can now easily prove the trace formula, Eq. (2.84). In the eigenvector basis, 


det ( exp(A)) = I] e*" = exp » 3 = exp (trace(a)) : (6.27) 
ML Lu 


Since the determinant and trace are basis-independent, this proves the trace formula. 


Expectation Values 


The expectation value of a Hermitian operator H associated with the normalized function 
w was defined in Eq. (5.61) as 


(H) = (WH |W), (6.28) 


where it was shown that if an orthonormal basis was introduced, with H then represented 
by a matrix H and w represented by a vector a, this expectation value assumed the form 


(H) =a'Ha= (alHla) = a hyudu- (6.29) 


If these quantities are expressed in the orthonormal eigenvector basis, Eq. (6.29) becomes 


(ya a y= Yea hie (6.30) 
bh bh 


where a,, is the coefficient of the eigenvector ¢, (with eigenvalue 4,,) in the expansion 
of w. We note that the expectation value is a weighted sum of the eigenvalues, with the 
weights nonnegative, and adding to unity because 


(ala) = ie (6.31) 


lL 


An obvious implication of Eq. (6.30) is that the expectation value (H) cannot be 
smaller than the smallest eigenvalue nor larger than the largest eigenvalue. The quantum- 
mechanical interpretation of this observation is that if H corresponds to a physical quantity, 
measurements of that quantity will yield the values 4,, with relative probabilities given by 
lanl", and hence with an average value corresponding to the weighted sum, which is the 
expectation value. 

Hermitian operators arising in physical problems often have finite smallest eigenvalues. 
This, in turn, means that the expectation value of the physical quantity associated with the 
operator has a finite lower bound. We thus have the frequently useful relation 


If the algebraically smallest eigenvalue of H is finite, then, for any w, (W\|H|y) will 
be greater than or equal to this eigenvalue, with the equality occurring only if is an 
eigenfunction corresponding to the smallest eigenvalue. 
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Positive Definite and Singular Operators 


If all the eigenvalues of an operator A are positive, it is termed positive definite. If and 
only if A is positive definite, its expectation value for any nonzero w, namely (w|Alyw), 
will also be positive, since (when yw is normalized) it must be equal to or larger than the 
smallest eigenvalue. 


Example 6.4.2 — OveRLaP MaTRIX 


Let S be an overlap matrix of elements sy, = (Xv|Xyz), where the x, are members of a 
linearly independent, but nonorthogonal basis. If we assume an arbitrary nonzero function 
y to be expanded in terms of the x,, according to w = >, by xv, the scalar product (|W) 
will be given by 


(le) =o Bi syudu, 
vu 


which is of the form of an expectation value for the matrix S. Since (w|w) is an inherently 
positive quantity, we conclude that S is positive definite. | 


If, on the other hand, the rows (or the columns) of a square matrix represent linearly 
dependent forms, either as coefficients in a basis-set expansion or as the coefficients of 
a linear expression in a set of variables, the matrix will be singular, and that fact will be 
signaled by the presence of eigenvalues that are zero. The number of zero eigenvalues 
provides an indication of the extent of the linear dependence; if an n x n matrix has m zero 
eigenvalues, its rank will be n — m. 


Exercises 


6.4.1 


6.4.2 


Show that the eigenvalues of a matrix are unaltered if the matrix is transformed by a 
similarity transformation—a transformation that need not be unitary, but of the form 
given in Eq. (5.78). 

This property is not limited to symmetric or Hermitian matrices. It holds for any 
matrix satisfying an eigenvalue equation of the type Ax = Ax. If our matrix can be 
brought into diagonal form by a similarity transformation, then two immediate conse- 
quences are that: 


1. The trace (sum of eigenvalues) is invariant under a similarity transformation. 


2. The determinant (product of eigenvalues) is invariant under a_ similarity 
transformation. 


Note. The invariance of the trace and determinant are often demonstrated by using the 
Cayley-Hamilton theorem, which states that a matrix satisfies its own characteristic 
(secular) equation. 


As a converse of the theorem that Hermitian matrices have real eigenvalues and that 
eigenvectors corresponding to distinct eigenvalues are orthogonal, show that if 
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6.4.3 


6.4.4 


6.4.5 


6.4.6 


6.4.7 


6.4.8 


6.4.9 
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(a) the eigenvalues of a matrix are real and 


(b) the eigenvectors satisfy x!x j= 5ijs 


then the matrix is Hermitian. 


Show that a real matrix that is not symmetric cannot be diagonalized by an orthogonal 
or unitary transformation. 


Hint. Assume that the nonsymmetric real matrix can be diagonalized and develop a 
contradiction. 


The matrices representing the angular momentum components L,, Ly, and L; are all 
Hermitian. Show that the eigenvalues of L2, where L? = | Be + Le + L2, are real and 
nonnegative. 


A has eigenvalues A; and corresponding eigenvectors |x;). Show that A~! has the same 
eigenvectors but with eigenvalues 47’. 


A square matrix with zero determinant is labeled singular. 


(a) IfA is singular, show that there is at least one nonzero column vector v such that 
Alv) = 0. 

(b) Ifthere is a nonzero vector |v) such that 
Alv) =0, 


show that A is a singular matrix. This means that if a matrix (or operator) has 
zero as an eigenvalue, the matrix (or operator) has no inverse and its determinant 
is Zero. 


Two Hermitian matrices A and B have the same eigenvalues. Show that A and B are 
related by a unitary transformation. 


Find the eigenvalues and an orthonormal set of eigenvectors for each of the matrices of 
Exercise 2.2.12. 


The unit process in the iterative matrix diagonalization procedure known as the Jacobi 
method is a unitary transformation that operates on rows/columns i and j of a real 
symmetric matrix A to make a;; = a;; = 0. If this transformation (from basis functions 
gj and yj to gy; and ¢) is written 


¢; = gj cosé — gj sind, Y= $i sin? + yj; cos0, 
: ; 2aij 
(a) Show that a;; is transformed to zero if tan 20 = ———_, 
Ajj — Gii 
b) Show that a, remains unchanged if neither jx nor v isi or j, 
m g J 


(c) Finda‘, and a; j and show that the trace of A is not changed by the transformation, 


(d) Finda; ,, and a, ,, (where yu is neither i nor j) and show that the sum of the squares 
of the off-diagonal elements of A is reduced by the amount 2a}... 
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6.5 NORMAL MATRICES 


Thus far the discussion has been centered on Hermitian eigenvalue problems, which we 
showed to have real eigenvalues and orthogonal eigenvectors, and therefore capable of 
being diagonalized by a unitary transformation. However, the class of matrices which can 
be diagonalized by a unitary transformation contains, in addition to Hermitian matrices, all 
other matrices that commute with their adjoints; a matrix A with this property, namely 


[A, A'] =0, 


is termed normal.” Clearly Hermitian matrices are normal, as H' = H. Unitary matrices 
are also normal, as U commutes with its inverse. Anti-Hermitian matrices (with Ai= —A) 
are also normal. And there exist normal matrices that are not in any of these categories. 

To show that normal matrices can be diagonalized by a unitary transformation, it suf- 
fices to prove that their eigenvectors can form an orthonormal set, which reduces to the 
requirement that eigenvectors of different eigenvalues be orthogonal. The proof proceeds 
in two steps, of which the first is to demonstrate that a normal matrix A and its adjoint have 
the same eigenvectors. 

Assuming |x) to be an eigenvector of A with eigenvalue 1, we have the equation 


(A — 1) |x) = 0. 
Multiplying this equation on its left by (x|(A’ — A*1), we have 
(x|(AT —*1)(A — AL)|x) = 0, 


after which we use the normal property to interchange the two parenthesized quantities, 
bringing us to 


(x|(A — A1)(AT — 4*1)|x) = 0. 
Moving the first parenthesized quantity into the left half-bracket, we have 
(At — A*1)x|(A? — A*1)|x) = 0, 


which we identify as a scalar product of the form (f |). The only way this scalar product 
can vanish is if 


(At — 4*1)|x) =0, 


showing that |x) is an eigenvector of A‘ in addition to being an eigenvector of A. However, 
the eigenvalues of A and A‘ are complex conjugates; for general normal matrices A need 
not be real. 

A demonstration that the eigenvectors are orthogonal proceeds along the same lines are 
for Hermitian matrices. Letting |x;) and |x;) be two eigenvectors (of both A and At), we 
form 


(xj|Alx) = Ai(xjlxi), iA" xj) = 24 (xilxy). (6.32) 





2Normal matrices are the largest class of matrices that can be diagonalized by unitary transformations. For an extensive discus- 
sion of normal matrices, see P. A. Macklin, Normal matrices for physicists, Am. J. Phys. 52: 513 (1984). 
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We now take the complex conjugate of the second of these equations, noting that 
(x;|x;)* = (x;|x;). To form the complex conjugate of (x;|A'|x;), we convert it first to 
(Ax; |x;) and then interchange the two half-brackets. Equations (6.32) then become 


(xjlAIx;) =Ai(xj[xi),  (xj|AIxi) = Aj (xxi). (6.33) 


These equations indicate that if A; A 4;, we must have (x;|x;) = 0, thus proving 
orthogonality. 

The fact that the eigenvalues of a normal matrix At are complex conjugates of the eigen- 
values of A enables us to conclude that 


e the eigenvalues of an anti-Hermitian matrix are pure imaginary (because A’ = —A, 
A* = —2), and 


e the eigenvalues of a unitary matrix are of unit magnitude (because A* = 1/A, equivalent 
to A*A = 1). 


Example 6.5.1 = ANorMAL EIGENSYSTEM 


Consider the unitary matrix 


Cc 
ll 
oro 
- OO 
oor 


This matrix describes a rotational transformation in which z > x, x > y, and y > z. 
Because it is unitary, it is also normal, and we may find its eigenvalues from the secular 
equation 


sy @ 4 
det(U—A1)=] 1 -A O]=—A7+1=0, 
Ly 


which has solutions 4 = 1, w, and w*, where w = e27'/3, (Note that w? = 1, so w* = 
1/w = w”.) Because U is real, unitary and describes a rotation, its eigenvalues must fall on 
the unit circle, their sum (the trace) must be real, and their product (the determinant) must 
be +1. This means that one of the eigenvalues must be +1, and the remaining two may be 
real (both +1 or both —1) or form a complex conjugate pair. We see that the eigenvalues 
we have found satisfy these criteria. The trace of U is zero, as is the sum 1 + @ + w* (this 
may be verified graphically; see Fig. 6.3). 
Proceeding to the eigenvectors, substitution into the equation 


(U—Al)c=0 
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FiGuRE 6.3 Eigenvalues of the matrix U, Example 6.5.1. 


yields (in unnormalized form) 


1 1 1 
AM=1, cqoe=]1], Aaw=oa, O=]a*], 43 =o’, B=] 
1 o o* 


The interpretation of this result is interesting. The eigenvector c; is unchanged by U 
(application of U multiplies it by unity), so it must lie in the direction of the axis of the 
rotation described by U. The other two eigenvectors are complex linear combinations of 
the coordinates that are invariant in “direction,” but not in phase under application of U. We 
write “direction” in quotes, since the complex coefficients in the eigenvectors cause them 
not to identify directions in physical space. Nevertheless, they do form quantities that are 
invariant except for multiplication by the eigenvalue (which we identify as a phase, since 
it is of magnitude unity). The argument of w, 27/3, identifies the amount of the rotation 
about the c; axis. Coming back to physical reality, we note that we have found that U 
corresponds to a rotation of amount 27/3 about an axis in the (1,1,1) direction; the reader 
can verify that this indeed takes x into y, y into z, and z into x. 

Because U is normal, its eigenvectors must be orthogonal. Since we now have complex 
quantities, in order to check this we must compute the scalar product of two vectors a and 
b from the formula a‘b. Our eigenvectors pass this test. 

Finally, let’s verify that U and U" have the same eigenvectors, and that corresponding 
eigenvalues are complex conjugates. Taking the adjoint of U, we have 


ut = 


- OO 


1 0 
0 1 
0 0 


Using the eigenvectors we have already found to form U'¢;, the verification is easily estab- 
lished. We illustrate with ez: 


0 1 0 1 o* 1 
00 1 o*}=|[o | =a* [ a* |, 
1 0 0 0) 1 (0) 


as required. | 
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Nonnormal Matrices 


Matrices that are not even normal sometimes enter problems of importance in physics. 
Such a matrix, A, still has the property that the eigenvalues of A’ are the complex conju- 
gates of the eigenvalues of A, because det(A‘) = [det(A)]*, so 


det(A—21)=0 —>_ det(At —A*1) =0, 


for the same A, but it is no longer true that the eigenvectors are orthogonal or that A and AT 
have common eigenvectors. 

Here is an example arising from the analysis of vibrations in mechanical systems. We 
consider the vibrations of a classical model of the CO2 molecule. Even though the model is 
classical, it is a good representation of the actual quantum-mechanical system, as to good 
approximation the nuclei execute small (classical) oscillations in the Hooke’s-law potential 
generated by the electron distribution. This problem is an illustration of the application of 
matrix techniques to a problem that does not start as a matrix problem. It also provides an 
example of the eigenvalues and eigenvectors of an asymmetric real matrix. 


Example 6.5.2 Normal Moves 





Consider three masses on the x-axis joined by springs as shown in Fig. 6.4. The spring 
forces are assumed to be linear in the displacements from equilibrium (small displace- 
ments, Hooke’s law), and the masses are constrained to stay on the x-axis. 

Using a different coordinate for the displacement of each mass from its equilibrium 
position, Newton’s second law yields the set of equations 





a: ( 

a oa cs oe x2) 

. k k 

2 = (x2 — x1) (x2 — x3) (6.34) 
m m 


os k ( ) 
X23 = —— (X%3 — X2), 
7 M : ‘ 


where X stands for d2x /d t. We seek the frequencies, w, such that all the masses vibrate 
at the same frequency. These are called the normal modes of vibration,’ and are solutions 
to Eqs. (6.34) with 


xi(t)=xjsinwt, i=1, 2, 3. 





FiGURE 6.4 The three-mass spring system representing the CO2 molecule. 


3For detailed discussion of normal modes of vibration, see E. B. Wilson, Jr., J. C. Decius, and P. C. Cross, Molecular 
Vibrations—The Theory of Infrared and Raman Vibrational Spectra. New York: Dover (1980). 
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Substituting this solution set into Eqs. (6.34), these equations, after cancellation of the 
common factor sin wt, become equivalent to the matrix equation 





k k 
a i 0 
M M ™ ie 
k 2k k 
Ax= x2 | =+a7 | x J. (6.35) 
m m m 
ke’ — 
0 — nee 
M M 





k- k 
es wie 0 
M M 
k 2k k 
wo =0, (6.36) 
m m m 
k ke. 3 
0 fo ae 
M ™M 


which expands to 


The eigenvalues are 


a. k k 4 2k 
o=0, —, —+-—. 
M Mm 
For w* = 0, substitution back into Eq. (6.35) yields 
xX} —x2=0, -—x, +2x2-2x3=0, —x2+2x3=0, 


which corresponds to x1 = x2 = x3. This describes pure translation with no relative motion 
of the masses and no vibration. 
For w* =k/M, Eq. (6.35) yields 


xXp=—-xX3, x2 =0. 


The two outer masses are moving in opposite directions. The central mass is stationary. In 
CO>z this is called the symmetric stretching mode. 
Finally, for @? =k/M + 2k/m, the eigenvector components are 
2M 
X] =X3, X2 = —— XX]. 
m 
In this antisymmetric stretching mode, the two outer masses are moving, together, in a 
direction opposite to that of the central mass, so one CO bond stretches while the other 
contracts the same amount. In both of these stretching modes, the net momentum of the 
motion is zero. 
Any displacement of the three masses along the x-axis can be described as a linear 
combination of these three types of motion: translation plus two forms of vibration. 
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The matrix A of Eq. (6.35) is not normal; the reader can check that AAC # AiA. Asa 
result, the eigenvectors we have found are not orthogonal, as is obvious by examination of 
the unnormalized eigenvectors: 
; , gk : > ik Ok : 
o=0, x=]J1], o=—, x= Oo], o=—+—, x=]-2M/m 
M M m 
1 -1 1 
Using the same A values, we can solve the simultaneous equations 
(al : a) y=0. 
The resulting eigenvectors are 
: ! > k : > k 2k ! 
o=0, x=|[m/M], w=—, x=|0], wo =—+—, x=]|-2 
M Mem 
1 1 1 
These vectors are neither orthogonal nor the same as the eigenvectors of A. a 
Defective Matrices 
If a matrix is not normal, it may not even have a full complement of eigenvectors. Such 
matrices are termed defective. By the fundamental theorem of algebra, a matrix of dimen- 
sion N will have N eigenvalues (when their multiplicity is taken into account). It can also 
be shown that any matrix will have at least one eigenvector corresponding to each of its 
distinct eigenvalues. But it is not always true that that an eigenvalue of multiplicity k > 1 
will have k eigenvectors. We give as a simple example a matrix with the doubly degenerate 
eigenvalue A = 1: 
1 1 : : 1 
has only the single eigenvector : 
0 1 0 
Exercises 
6.5.1 Find the eigenvalues and corresponding eigenvectors for 
2 4 
1 2) 
Note that the eigenvectors are not orthogonal. 
ANS. 11 =0, e; = (2,-1); 
A2 =4,  =(2, 1). 
6.5.2 If A is a 2 x 2 matrix, show that its eigenvalues A satisfy the secular equation 
0? — d trace(A) + det(A) = 0. 
6.5.3 Assuming a unitary matrix U to satisfy an eigenvalue equation Uc = Ac, show that the 


eigenvalues of the unitary matrix have unit magnitude. This same result holds for real 
orthogonal matrices. 





6.5.4 


6.5.5 


6.5.6 


6.5.7 


6.5.8 


6.5.9 


6.5.10 


6.5.11 
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Since an orthogonal matrix describing a rotation in real 3-D space is a special case 
of a unitary matrix, such an orthogonal matrix can be diagonalized by a unitary 
transformation. 


(a) Show that the sum of the three eigenvalues is 1 + 2cosq, where ¢ is the net angle 
of rotation about a single fixed axis. 

(b) Given that one eigenvalue is 1, show that the other two eigenvalues must be e!? 
ande™'?, 

Our orthogonal rotation matrix (real elements) has complex eigenvalues. 


A is an nth-order Hermitian matrix with orthonormal eigenvectors |x;) and real eigen- 
values 4} <Az <A3 <--- <A,. Show that for a unit magnitude vector |y), 


ALS (ylAly) SAn. 





A particular matrix is both Hermitian and unitary. Show that its eigenvalues are all +1. 
Note. The Pauli and Dirac matrices are specific examples. 


For his relativistic electron theory Dirac required a set of four anticommuting matrices. 
Assume that these matrices are to be Hermitian and unitary. If these are n x n matrices, 
show that n must be even. With 2 x 2 matrices inadequate (why?), this demonstrates 
that the smallest possible matrices forming a set of four anticommuting, Hermitian, 
unitary matrices are 4 x 4. 


A is anormal matrix with eigenvalues A, and orthonormal eigenvectors |x,,). Show that 
A may be written as 


A= Yo AnlXn) (Xnl- 


Hint. Show that both this eigenvector form of A and the original A give the same result 
acting on an arbitrary vector |y). 


; : . 1 
A has eigenvalues 1 and —1 and corresponding eigenvectors (3) and (1). 


1 0 
ANS. a=(j a) 


A non-Hermitian matrix A has eigenvalues A; and corresponding eigenvectors |u;). The 
adjoint matrix A’ has the same set of eigenvalues but different corresponding eigen- 
vectors, |v;). Show that the eigenvectors form a biorthogonal set in the sense that 


(viluj)=0 for nF F ij. 


Construct A. 


You are given a pair of equations: 
Alfn) = AnIn) 


Algn) =Anl|f,) with A real. 
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6.5.12 


6.5.13 


6.5.14 


6.5.15 


(a) Prove that |f,,) is an eigenvector of (AA) with eigenvalue pak 
(b) Prove that |g,) is an eigenvector of (AA) with eigenvalue bee 
(c) State how you know that 


(1) The |f,,) form an orthogonal set. 
(2) The |g,) form an orthogonal set. 
(3) he is real. 


Prove that A of the preceding exercise may be written as 
A=) -Anlgn) (fal, 
n 


with the |g,,) and (f,,| normalized to unity. 


Hint. Expand an arbitrary vector as a linear combination of |f,,). 


ae) 


(a) Construct the transpose A and the symmetric forms AA and AA. 


Given 


(b) From AA\gn) = ee lg), find A, and |g,,). Normalize the |g,). 

(c) From AA\f,) = Ae lg,), find A, [same as (b)] and |f,,). Normalize the |f,,). 
(d) Verify that Alf) = An|gn) and Alg,) = An|fn). 

(e) Verify that A= >>, An| Gn) (fn. 


Given the eigenvalues A; = 1, Az = —1 and the corresponding eigenvectors 


1 
In) =(9) l= (j) tar= (9). ana 


|g2) = = (S). 


(a) construct A; 
(b) verify that Alf,) = An|gn); 
(c) verify that Alg,) = An|f,). 


1/1 -1 
ANS. a= ) 


Two matrices U and H are related by 
U= eiaH 


’ 


with a real. 





6.5.16 


6.5.17 


6.5.18 


6.5.19 


6.5.20 
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(a) IfH is Hermitian, show that U is unitary. 

(b) IfU is unitary, show that H is Hermitian. (H is independent of a.) 
(c) Iftrace H=0, show that detU = +1. 

(d) IfdetU=-+1, show that trace H=0. 


Hint. H may be diagonalized by a similarity transformation. Then U is also diagonal. 
The corresponding eigenvalues are given by uj; = exp(iahj;). 


Ann xn matrix A has n eigenvalues A;. If B= A. show that B has the same eigen- 
vectors as A with the corresponding eigenvalues B; given by Bj; = exp(Aj). 


A matrix P is a projection operator satisfying the condition 
Pp? =P. 
Show that the corresponding eigenvalues (7), and p satisfy the relation 
(07) = (0a)” = pr. 
This means that the eigenvalues of P are 0 and 1. 
In the matrix eigenvector-eigenvalue equation 
A|x;) = AilXi), 


A is ann xn Hermitian matrix. For simplicity assume that its n real eigenvalues are 
distinct, 4; being the largest. If |x) is an approximation to |x), 


n 
Ix) = |x1) + dix), 
i=2 
show that 


(x|A[x) 
(x|x) 





<A1 


and that the error in A is of the order |5;|*. Take |5;| << 1. 
Hint. The n vectors |x;) form a complete orthogonal set spanning the n-dimensional 
(complex) space. 


Two equal masses are connected to each other and to walls by springs as shown in 
Fig. 6.5. The masses are constrained to stay on a horizontal line. 


(a) Set up the Newtonian acceleration equation for each mass. 
(b) Solve the secular equation for the eigenvectors. 
(c) Determine the eigenvectors and thus the normal modes of motion. 


Given a normal matrix A with eigenvalues 4;, show that At has eigenvalues 4*, its 


real part (A + A‘')/2 has eigenvalues Jte(A;), and its imaginary part (A — A‘) /2i has 
eigenvalues 3m (Aj). 
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FiGurRE 6.5 Triple oscillator. 


Consider a rotation given by Euler angles a = 17/4, B = 1/2, y =52/4. 


(a) Using the formula of Eq. (3.37), construct the matrix U representing this rotation. 


(b) Find the eigenvalues and eigenvectors of U, and from them describe this rotation 
by specifying a single rotation axis and an angle of rotation about that axis. 


Note. This technique provides a representation of rotations alternative to the Euler 
angles. 
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CHAPTER 7 


ORDINARY DIFFERENTIAL 
EQUATIONS 


Much of theoretical physics is originally formulated in terms of differential equations in the 
three-dimensional physical space (and sometimes also time). These variables (e.g., x, y, Z, 
t) are usually referred to as independent variables, while the function or functions being 
differentiated are referred to as dependent variable(s). A differential equation involving 
more than one independent variable is called a partial differential equation, often abbre- 
viated PDE. The simpler situation considered in the present chapter is that of an equation 
in a single independent variable, known as an ordinary differential equation, abbreviated 
ODE. As we shall see in a later chapter, some of the most frequently used methods for solv- 
ing PDEs involve their expression in terms of the solutions to ODEs, so it is appropriate to 
begin our study of differential equations with ODEs. 


7.1 INTRODUCTION 
To start, we note that the taking of a derivative is a linear operation, meaning that 


d diy 
a £(ap(x) + ovo) =a 456, 


and the derivative operation can be viewed as defining a linear operator: £ = d/dx. Higher 
derivatives are also linear operators, as for example 


d* wee dew 
Fas (AP) + U(X) =a 5 +b. 
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Note that the linearity under discussion is that of the operator. For example, if we define 


d 
L= pO) + q(x), 
x 


it is identified as linear because 
do dw 
L(ag(x) + by(x)) =a (ro? + a) +b (pot + oor¥) 


=alo+blvy. 


We see that the linearity of £ imposes no requirement that either p(x) or q(x) be a linear 
function of x. Linear differential operators therefore include those of the form 


n d’ 
c= meo( Js) 


where the functions p,(x) are arbitrary. 

An ODE is termed homogeneous if the dependent variable (here ~) occurs to the same 
power in all its terms, and inhomogeneous otherwise; it is termed linear if it can be written 
in the form 





Lo(x) = F(x), (7.1) 


where £ is a linear differential operator and F(x) is an algebraic function of x (i.e., not 
a differential operator). An important class of ODEs are those that are both linear and 
homogeneous, and thereby of the form Ly = 0. 

The solutions to ODEs are in general not unique, and if there are multiple solutions it 
is useful to identify those that are linearly independent (linear dependence is discussed in 
Section 2.1). Homogeneous linear ODEs have the general property that any multiple of a 
solution is also a solution, and that if there are multiple linearly independent solutions, any 
linear combination of those solutions will also solve the ODE. This statement is equivalent 
to noting that if £ is linear, then, for all a and b, 


Lo=0 and Lwy=0 — CLiagt+by)=0. 


The Schrédinger equation of quantum mechanics is a homogeneous linear ODE (or if in 
more than one dimension, a homogeneous linear PDE), and the property that any linear 
combination of its solutions is also a solution is the conceptual basis for the well-known 
superposition principle in electrodynamics, wave optics and quantum theory. 

Notationally, it is often convenient to use the symbols x and y to refer, respectively, 
to independent and dependent variables, and a typical linear ODE then takes the form 
Ly = F(x). It is also customary to use primes to indicate derivatives: y’ = dy/dx. In 
terms of this notation, the superposition property of solutions y; and y2 of a homogeneous 
linear ODE tells us that the ODE also has as solutions c; y1, c2y2, and cy yj + c2y2, with 
the c; arbitrary constants. 

Some physically important problems (particularly in fluid mechanics and in chaos the- 
ory) give rise to nonlinear differential equations. A well-studied example is the Bernoulli 
equation 


y= p(x)yt+tqa)y", nO, 1, 


which cannot be written in terms of a linear operator applied to y. 
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Further terms used to classify ODEs include their order (highest derivative appear- 
ing therein), and degree (power to which the highest derivative appears after the ODE is 
rationalized if that is necessary). For many applications, the concept of linearity is more 
relevant than that of degree. 


7.2 FIRST-ORDER EQUATIONS 


Physics involves some first-order differential equations. For completeness it seems desir- 
able to touch upon them briefly. We consider the general form 
_ PG, y) 
Q(x, y) 

While there is no systematic way to solve the most general first-order ODE, there are 
a number of techniques that are often useful. After reviewing some of these techniques, 
we proceed to a more detailed treatment of linear first-order ODEs, for which systematic 
procedures are available. 





dy 
5, =f y= (i) 
XxX 


Separable Equations 


Frequently Eq. (7.2) will have the special form 
dy P(x) 


dx Q(y) 





(7.3) 


Then it may be rewritten as 
P(x)dx + O(y)dy =0. 
Integrating from (xo, yo) to (x, y) yields 


x y 
/ P(x)dx + i O(y)dy =0. 
xo yo 

Since the lower limits, x9 and yo, contribute constants, we may ignore them and simply add 
a constant of integration. Note that this separation of variables technique does not require 
that the differential equation be linear. 


Example 7.2.1 PARACHUTIST 


We want to find the velocity of a falling parachutist as a function of time and are partic- 
ularly interested in the constant limiting velocity, vo, that comes about by air drag, taken 
to be quadratic, —bv”, and opposing the force of the gravitational attraction, mg, of the 
Earth on the parachutist. We choose a coordinate system in which the positive direction 
is downward so that the gravitational force is positive. For simplicity we assume that the 
parachute opens immediately, that is, at time t = 0, where v(t) = 0, our initial condition. 
Newton’s law applied to the falling parachutist gives 


mv =mg — bv’, (7.4) 


where m includes the mass of the parachute. 
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The terminal velocity, v9, can be found from the equation of motion as tf + oo; when 
there is no acceleration, ) = 0, and 


bug =mg, or v= 9 
b 

It simplifies further work to rewrite Eq. (7.4) as 

i — vp =; 
This equation is separable, and we write it in the form 

d b 
a = at. (7.5) 
vp — U m 


Using partial fractions to write 





a “( 1 1 ) 
ve—v2 29 v+tu9 v—vo/’ 


it is straightforward to integrate both sides of Eq. (7.5) (the left-hand side from v = 0 to v, 
the right-hand side from t = 0 to f), yielding 


1 vo + VU b 
In = 


= t. 
2v0 vg—-—v m 





Solving for the velocity, we have 





gt a4 sinh(t/T) ate 
— =v = vo tanh —, 
os ra coh) OP 


where T = \/m/gb is the time constant governing the asymptotic approach of the velocity 
to its limiting value, vo. 

Inserting numerical values, g = 9.8 m/s”, and taking b = 700 kg/m, m = 70 kg, gives 
v9 = /9.8/10 © 1 m/s © 3.6 km/h & 2.234 mish, the walking speed of a pedestrian at 
landing, and T = ./m/bg = 1//10-9.8 = 0.1 s. Thus, the constant speed vg is reached 
within a second. Finally, because it is always important to check the solution, we verify 
that our solution satisfies the original differential equation: 





._ cosh(t/T) vg sinh?(t/T) v9 _ v9 vu? Bg 
~ cosh(t/T) T  cosh2(t/T) TT Tro , 


A more realistic case, where the parachutist is in free fall with an initial speed v(0) > 0 
before the parachute opens, is addressed in Exercise 7.2.16. | 





7.2. First-Order Equations 333. 
Exact Differentials 


Again we rewrite Eq. (7.2) as 
P(x, y)dx + Q(x, y)dy =0. (7.6) 


This equation is said to be exact if we can match the left-hand side of it to a differential 
dg, and thereby reach 


a a 
do = "dx + Say =0. (7.7) 
Ox dy 
Exactness therefore implies that there exists a function g(x, y) such that 
dg dg 
—=P(x,y) and —=Q(x,y), (7.8) 
ox dy 


because then our ODE corresponds to an instance of Eq. (7.7), and its solution will be 
v(x, y) = constant. 

Before seeking to find a function @ satisfying Eq. (7.8), it is useful to determine whether 
such a function exists. Taking the two formulas from Eq. (7.8), differentiating the first with 
respect to y and the second with respect to x, we find 


ay _dP@y) 1 #e _30G,y) 
dydax dy axdy Ax 
and these are consistent if and only if 
dP(x,y) _ OM, y) 
dy ax 
We therefore conclude that Eq. (7.6) is exact only if Eq. (7.9) is satisfied. Once exactness 
has been verified, we can integrate Eqs. (7.8) to obtain g and therewith a solution to the 
ODE. 
The solution takes the form 
x y 
g(x,y)= / P(x, y)dx+ / Q(xo, y)dy = constant. (7.10) 
yo 


x0 








’ 





(7.9) 


Proof of Eq. (7.10) is left to Exercise 7.2.7. 
We note that separability and exactness are independent attributes. All separable ODEs 
are automatically exact, but not all exact ODEs are separable. 


Example 7.2.2 A NONSEPARABLE EXACT ODE 
Consider the ODE 
y+ (142) =0. 
Xx 
Multiplying by x dx, this ODE becomes 
(x + y)dx +xdy=0, 
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which is of the form 
P(x, y)dx + O(x, y)dy =0, 
with P(x, y) =x + y and Q(x, y) = x. The equation is not separable. To check if it is 
exact, we compute 
OF ey) og dQ dx 
dy dy ° ax ax 
These partial derivatives are equal; the equation is exact, and can be written in the form 


dg = Pdx + Qdy=0. 


=1. 





The solution to the ODE will be g = C, with g computed according to Eq. (7.10): 


y 


- 7 2 2 
x Xx, 
w= [ortyrds+ f xody = (F+9- x] + (Xoy — XoYo) 
x0 


2 2 
0 
x2 
= > + xy + constant terms. 
Thus, the solution is 
x? 4 C 
—+yy=C, 
2 y 


which if desired can be solved to give y as a function of x. We can also check to make sure 
that our solution actually solves the ODE. a 


It may well turn out that Eq. (7.6) is not exact and that Eq. (7.9) is not satisfied. However, 
there always exists at least one and perhaps an infinity of integrating factors a(x, y) such 
that 


a(x, y)P(x, y)dx + a(x, y)O™, y)dy =0 


is exact. Unfortunately, an integrating factor is not always obvious or easy to find. A sys- 
tematic way to develop an integrating factor is known only when a first-order ODE is 
linear; this will be discussed in the subsection on linear first-order ODEs. 


Equations Homogeneous in x and y 


An ODE is said to be homogeneous (of order n) in x and y if the combined powers of 
x and y add to n in all the terms of P(x, y) and Q(x, y) when the ODE is written as in 
Eq. (7.6). Note that this use of the term “homogeneous” has a different meaning than when 
it was used to describe a linear ODE as given in Eq. (7.1) with the term F(x) equal to zero, 
because it now applies to the combined power of x and y. 

A first-order ODE, which is homogeneous of order n in the present sense (and not nec- 
essarily linear), can be made separable by the substitution y = xv, with dy = xdvu+vdx. 
This substitution causes the x dependence of all the terms of the equation containing dv to 
be x”"*+!, with all the terms containing dx having x-dependence x”. The variables x and v 
can then be separated. 
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Example 7.2.3 AN ODE HOMOGENEOUS IN x AND y 


Consider the ODE 
(2x + y)dx +xdy=0, 


which is homogeneous in x and y. Making the substitution y = xv, withdy = xdv+vudx, 
the ODE becomes 


(20+ 2)dx+xdv=0, 


which is separable, with solution In x + 5 In(v+1) = C, which is equivalent to x7(v+1) = 
C. Forming y = xv, the solution can be rearranged into 


Deere 
x 


Isobaric Equations 


A generalization of the preceding subsection is to modify the definition of homogeneity by 
assigning different weights to x and y (note that corresponding weights must then also be 
assigned to dx and dy). If assigning unit weight to each instance of x or dx and a weight 
m to each instance of y or dy makes the ODE homogeneous as defined here, then the 
substitution y = x’”’v will make the equation separable. We illustrate with an example. 


Example 7.2.4 — AN IsoBARIC ODE 


Here is an isobaric ODE: 
(x? — y)dx +xdy =0. 


Assigning x weight 1, and y weight m, the term x7dx has weight 3; the other two terms 
have weight | + m. Setting 3 = 1 +m, we find that all terms can be assigned equal weight 
if we take m = 2. This means that we should make the substitution y= x*v. Doing so, 
we get 


(1 — v)dx +xdv=0, 
which separates into 


dx dv 
+ =0 
x v+l1 





> Inx+Inv+1)=InC, or xW+)D=C. 


Cc 
From this, we get v = — — 1. Since y = x*v, the ODE has solution y = Cx — x. a 
x 
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Linear First-Order ODEs 


While nonlinear first-order ODEs can often (but not always) be solved using the strategies 
already presented, the situation is different for the linear first-order ODE because proce- 
dures exist for solving the most general equation of this type, which we write in the form 


dy 
dg 1 POY = IO). (7.11) 
XxX 


If our linear first-order ODE is exact, its solution is straightforward. If it is not exact, we 
make it exact by introducing an integrating factor w(x), so that the ODE becomes 


d 
a(x) + a(x) p(x)y = a(x)q(x). (7.12) 


The reason for multiplication by a(x) is to cause the left-hand side of Eq. (7.12) to become 
a perfect differential, so we require that w(x) be such that 


d dy 
—|[a(x)y] = a(x) + a(x) p(x)y. (7.13) 
dx dx 
Expanding the left-hand side of Eq. (7.13), that equation becomes 
d d d 
a(x) + Sy =a(x) = +a(x) py, 
so a must satisfy 
da 
ax = a(x) p(x). (7.14) 
X 


This is a separable equation and therefore soluble. Separating the variables and integrat- 


ing, we obtain 
a d Xx 
[=f roar. 
a 


We need not consider the lower limits of these integrals because they combine to yield a 
constant that does not affect the performance of the integrating factor and can be set to 
zero. Completing the evaluation, we reach 


x 


a(x) =exp [ pcoa F (7.15) 


With a@ now known we proceed to integrate Eq. (7.12), which, because of Eq. (7.13), 
assumes the form 


d 
Fx tI = oa), 
x 


which can be integrated (and divided through by a) to yield 
x 


$652 i sae as aS. (7.16) 


1 
au(x) 
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The two terms of Eq. (7.16) have an interesting interpretation. The term y; = C/a(x) 
is the general solution of the homogeneous equation obtained by replacing g(x) with zero. 
To see this, write the homogeneous equation as 


d 
o = —p(x)dx, 
y 


which integrates to 


x 


iny=— f poydx+C=-Ina +¢. 


Taking the exponential of both sides and renaming e© as C, we get just y = C/a(x). The 
other term of Eq. (7.16), 


1 x 
y2 = sap | rads (7.17) 
a(x) 


corresponds to the right-hand side (source) term g(x), and is a solution of the original 
inhomogeneous equation (as is obvious because C can be set to zero). We thus have the 
general solution to the inhomogeneous equation presented as a particular solution (or, 
in ODE parlance, a particular integral) plus the general solution to the corresponding 
homogeneous equation. 

The above observations illustrate the following theorem: 


The solution of an inhomogeneous first-order linear ODE is unique except for an arbi- 
trary multiple of the solution of the corresponding homogeneous ODE. 


To show this, suppose y; and y2 both solve the inhomogeneous ODE, Eq. (7.11). Then, 
subtracting the equation for y2 from that for y;, we have 


yi — 2 + P*)O1 — y2) =0. 


This shows that y; — y2 is (at some scale) a solution of the homogeneous ODE. Remember 
that any solution of the homogeneous ODE remains a solution when multiplied by an 
arbitrary constant. 

We also have the theorem: 


A first-order linear homogeneous ODE has only one linearly independent solution. 


Two functions y;(x) and y2(x) are linearly dependent if there exist two constants a and 
b, both nonzero, that cause ay; + by2 to vanish for all x. In the present situation, this is 
equivalent to the statement that y; and y2 are linearly dependent if they are proportional to 
each other. 
To prove the theorem, assume that the homogeneous ODE has the linearly independent 
solutions y; and y2. Then, from the homogeneous ODE, we have 
f / 
71 = p(x) = 2. 
nal y2 
Integrating the first and last members of this equation, we obtain 


Inyj=Iny2+C, equivalentto yy; =Cy2, 


contradicting our assumption that y; and yo are linearly independent. 
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Example 7.2.5 —RLCIRCUIT 


For a resistance-inductance circuit Kirchoff’s law leads to 


dI(t) 
ee a + RI(t)= V(t), 


where / (t) is the current, L and R are, respectively, constant values of the inductance and 
the resistance, and V(t) is the time-dependent input voltage. 
From Eq. (7.15), our integrating factor a(t) is 


is 
R 
a(t) = exp | pi age, 


Then, by Eq. (7.16), 
t 
t 
roaet| frre Oarse], 


with the constant C to be determined by an initial condition. 
For the special case V(t) = Vo, a constant, 


= L Vo 
T(t) =e7 Rt/E Yo lbs Gla. ace hye. 
(t)=e TR + 2 +Ce 
If the initial condition is 7 (0) = 0, then C = —Vo/R and 


I(t) = ue [1 ial 
a 


We close this section by pointing out that the inhomogeneous linear first-order ODE can 
also be solved by a method called variation of the constant, or alternatively variation of 
parameters, as follows. First, we solve the homogeneous ODE y’ + py = 0 by separation 
of variables as before, giving 


/ 


x a 
a iny=— f poxdx +inc, y(x) = Cexp = f pooax 
y 


Next we allow the integration constant to become x-dependent, that is, C > C(x). This is 
the reason the method is called “variation of the constant.” To prepare for substitution into 
the inhomogeneous ODE, we calculate y’: 


x x 


y’ =exp = f pooax [—pC(x) + C’'(x)] = —py() + C’(x) exp = f pooax 


Making the substitution for y’ into the inhomogeneous ODE y’ + py = q, some cancella- 
tion occurs, and we are left with 


x 


C'(x) exp = f pooax =4, 
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which is a separable ODE for C(x) that integrates to yield 
x Xx x 


cays f ep [ pcnay q(xX)dX and y=C(x)exp - f pooax 


This particular solution of the inhomogeneous ODE is in agreement with that called y2 in 


Eq. (7.17). 
Exercises 
7.2.1 From Kirchhoff’s law the current J in an RC (resistance-capacitance) circuit (Fig. 7.1) 
obeys the equation 
dI 1 
R—+—I/=0. 
dt a Cc 


(a) Find /(f). 
(b) Fora capacitance of 10,000 wF charged to 100 V and discharging through a resis- 
tance of 1 MQ, find the current J for tf = 0 and for t = 100 seconds. 
Note. The initial voltage is J) R or O/C, where Q = i I(t)dt. 
7.2.2 The Laplace transform of Bessel’s equation (n = 0) leads to 


(s* + Df’(s) + sf (s) =0. 


Solve for f(s). 
7.2.3 The decay of a population by catastrophic two-body collisions is described by 
dN 
— =-kN?. 
dt 


This is a first-order, nonlinear differential equation. Derive the solution 


z= 
wo=No(1++) 
T% 


where ty = (kKNo)~!. This implies an infinite population at t = —t0. 














FIGURE 7.1. RC circuit. 
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The rate of a particular chemical reaction A + B — C is proportional to the concentra- 
tions of the reactants A and B: 


dC 
— = afA(0) — CIB O — CO). 
(a) Find C(t) for A(O) # B(0). 
(b) Find C(t) for A(O) = B(O). 





The initial condition is that C(0) = 0. 


A boat, coasting through the water, experiences a resisting force proportional to v”, v 
being the boat’s instantaneous velocity. Newton’s second law leads to 
dv 
ni—_- = 
dt 
With v(t = 0) = vo, x(t = 0) = 0, integrate to find v as a function of time and v as a 
function of distance. 


—kv". 


In the first-order differential equation dy/dx = f(x, y), the function f(x, y) is a func- 
tion of the ratio y/x: 


yo 
qe ee 
Show that the substitution of u = y/x leads to a separable equation in u and x. 
The differential equation 
P(x, y)dx + O(x, y)dy =0 


is exact. Show that its solution is of the form 


x y 
g(x, y)= / P(x, y)dx+ / Q(xo, y)dy = constant. 
XO yo 


The differential equation 


P(x, y)dx + O(x, y)\dy =0 


is exact. If 
x y 
vty) = f Po yydx+ Oxo, y)dy, 
x0 Yo 
show that 
dp dp 
— = P(x,y), — = Q(x,y). 
ox oy 


Hence, g(x, y) = constant is a solution of the original differential equation. 


Prove that Eq. (7.12) is exact in the sense of Eq. (7.9), provided that a(x) satisfies 
Eq. (7.14). 





7.2.10 


7.2.11 


7.2.12 


7.2.13 


7.2.14 


7.2.15 
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A certain differential equation has the form 


f(x)dx + g(x)h(y)dy =0, 


with none of the functions f(x), g(x), A(y) identically zero. Show that a necessary and 
sufficient condition for this equation to be exact is that g(x) = constant. 


Show that 


y(x) = exp - f rear [ow [ oa q(s)ds +C 


is a solution of 


dy 
ae P(x) y(x) = q(x) 
XxX 


by differentiating the expression for y(x) and substituting into the differential equation. 


The motion of a body falling in a resisting medium may be described by 


dv 
—S = _— b 
m a mg v 
when the retarding force is proportional to the velocity, v. Find the velocity. Evaluate 


the constant of integration by demanding that v(0) = 0. 


Radioactive nuclei decay according to the law 
oN cath 
dt , 

N being the concentration of a given nuclide and 4, the particular decay constant. In 

a radioactive series of two different nuclides, with concentrations Nj(t) and N2(t), we 

have 


dN 

— =—-A1M1, 

dt 

dN2 

—— =A ,N — d2N2. 
dt 


Find N2(t) for the conditions Nj (0) = No and N2(0) = 0. 


The rate of evaporation from a particular spherical drop of liquid (constant density) is 
proportional to its surface area. Assuming this to be the sole mechanism of mass loss, 
find the radius of the drop as a function of time. 


In the linear homogeneous differential equation 
dv 
a 

the variables are separable. When the variables are separated, the equation is exact. 

Solve this differential equation subject to v(0) = ug by the following three methods: 


—av 


(a) Separating variables and integrating. 
(b) Treating the separated variable equation as exact. 
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7.2.16 


7.2.17 


7.2.18 


(c) Using the result for a linear homogeneous differential equation. 


ANS. v(t)=voe. 


(a) Solve Example 7.2.1, assuming that the parachute opens when the parachutist’s 
velocity has reached v; = 60 mi/h (regard this time as t = 0). Find v(¢). 


(b) For a skydiver in free fall use the friction coefficient b = 0.25 kg/m and mass 
m = 70 kg. What is the limiting velocity in this case? 


Solve the ODE 
(xy? — y)dx + xdy =0. 
Solve the ODE 
(x? — y7e!/*)dx + (x7 + xy)e”/*dy =0. 


Hint. Note that the quantity y/x in the exponents is of combined degree zero and does 
not affect the determination of homogeneity. 


7.3 ODES WITH CONSTANT COEFFICIENTS 


Before addressing second-order ODEs, the main topic of this chapter, we discuss a special- 
ized, but frequently occurring class of ODEs that are not constrained to be of specific order, 
namely those that are linear and whose homogeneous terms have constant coefficients. The 
generic equation of this type is 

dq" y d"- 1 y 


axe | "1 Gynt 


The homogeneous equation corresponding to Eq. (7.18) has solutions of the form y = e”’, 
where m is a solution of the algebraic equation 


d 
$e ba + agy = F(x). (7.18) 


m" +a,_jm""!+4..-+ajm+ ao =0, 


as may be verified by substitution of the assumed form of the solution. 

In the case that the m equation has a multiple root, the above prescription will not yield 
the full set of n linearly independent solutions for the original n th order ODE. If one then 
considers the limiting process in which two roots approach each other, it is possible to 
conclude that if e””* is a solution, then so is de”* /dm = xe”. A triple root would have 
solutions e”*, xe”, x7”, ete. 


Example 7.3.1 Hooke’s LAW SPRING 


A mass M attached to a Hooke’s Law spring (of spring constant k) is in oscillatory motion. 
Letting y be the displacement of the mass from its equilibrium position, Newton’s law of 
motion takes the form 
d’y 
M— =-ky, 
dt? “4 
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which is an ODE of the form y” + agy = 0, with agp = +k/M. The general solution to this 
ODE is of the form Cye”!! + Cye”2!, where m, and mz are the solutions of the algebraic 
equation m2 +a) =0. 

The values of m; and m2 are iw, where w = ./k/M, so the ODE has solution 


y(t) = Cee + Cie Or 





Since the ODE is homogeneous, we may alternatively describe its general solution using 
arbitrary linear combinations of the above two terms. This permits us to combine them to 
obtain forms that are real and therefore appropriate to the current problem. Noting that 
eiat + eT iot eiat 2: eit 
———— =coswt and —M——— =sinat, 
2 2i 


a convenient alternate form is 
y(t) =C, cosa@t + C2 sinat. 


The solution to a specific oscillation problem will now involve fitting the coefficients 
C, and C> to the initial conditions, as for example y(0) and y’(0). a 


Exercises 


7.3.1 
7.3.2 
7.3.3 
7.3.4 


Find the general solutions to the following ODEs. Write the solutions in forms that are 
entirely real (i.e., that contain no complex quantities). 


y” —2y"” — y+ 2y =0. 
y” —2y" + y' —2y=0. 
y” —3y'+2y=0. 
y+ 2y'+2y =0. 


7.4 SECOND-ORDER LINEAR ODES 


We now turn to the main topic of this chapter, second-order linear ODEs. These are of 
particular importance because they arise in the most frequently used methods for solving 
PDEs in quantum mechanics, electromagnetic theory, and other areas in physics. Unlike 
the first-order linear ODE, we do not have a universally applicable closed-form solution, 
and in general it is found advisable to use methods that produce solutions in the form of 
power series. As a precursor to the general discussion of series-solution methods, we begin 
by examining the notion of singularity as applied to ODEs. 


Singular Points 


The concept of singularity of an ODE is important to us for two reasons: (1) it is useful 
for classifying ODEs and identifying those that can be transformed into common forms 
(discussed later in this subsection), and (2) it bears on the feasibility of finding series 
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solutions to the ODE. This feasibility is the topic of Fuchs’ theorem (to be discussed 
shortly). 
When a linear homogeneous second-order ODE is written in the form 


y" + P(x)y’ + O(x)y =0, (7.19) 


points x9 for which P(x) and Q(x) are finite are termed ordinary points of the ODE. 
However, if either P(x) or Q(x) diverge as x — xo, the point xo is called a singular point. 
Singular points are further classified as regular or irregular (the latter also sometimes 
called essential singularities): 


e A singular point xo is regular if either P(x) or Q(x) diverges there, but (x — xo) P(x) 
and (x — x9)? Q(x) remain finite. 


e A singular point xg is irregular if P(x) diverges faster than 1/(x — x9) so that (x — 
x0) P(x) goes to infinity as x > xo, or if Q(x) diverges faster than 1/(x — xq)? so that 
(x — x0) Q(x) goes to infinity as x > xo. 


These definitions hold for all finite values of x9. To analyze the behavior at x — oo, we 
set x = 1/z, substitute into the differential equation, and examine the behavior in the limit 
z — 0. The ODE, originally in the dependent variable y(x), will now be written in terms 
of w(z), defined as w(z) = y(z~!). Converting the derivatives, 








dy(x) _dy(z"')dz_ dwiz)(_ 1 " 
= - = = i 7.20 
. dx dz dx dz a ed (hy) 
dy’ dz d 
" 2 2, f 4, 3, ,f 
= = = 2z°w. 21 
rane Cal zw’ | = zw” + 22°w (7.21) 


Using Eqs. (7.20) and (7.21), we transform Eq. (7.19) into 
zAw” + [227 — 2? P(z!)]w’ + O(z")w=0. (7.22) 


Dividing through by z* to place the ODE in standard form, we see that the possibility of a 
singularity at z = 0 depends on the behavior of 

2z — P(z~!) Q(z) 

—- and. 

z z 

If these two expressions remain finite at z = 0, the point x = oo is an ordinary point. If 
they diverge no more rapidly than 1/z and 1/z”, respectively, x = 00 is a regular singular 
point; otherwise it is an irregular singular point (an essential singularity). 


Example 7.4.1 BESSEL’S EQUATION 


Bessel’s equation is 


” 


ey +xy + (x7 — n’)y =0. 
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Comparing it with Eq. (7.19), we have 


1 n 
PQXy= = Q(x) =1- ae 


which shows that the point x = 0 is a regular singularity. By inspection we see that there 
are no other singularities in the finite range. As x > oo (z > 0), from Eq. (7.22) we have 
the coefficients 


227 —Z a 1—n?z?2 
2 a = 





z z 


Since the latter expression diverges as 1/z*, the point x = 00 is an irregular, or essential, 
singularity. a 


Table 7.1 lists the singular points of a number of ODEs of importance in physics. It 
will be seen that the first three equations in Table 7.1, the hypergeometric, Legendre, and 
Chebyshev, all have three regular singular points. The hypergeometric equation, with reg- 
ular singularities at 0, 1, and oo, is taken as the standard, the canonical form. The solutions 
of the other two may then be expressed in terms of its solutions, the hypergeometric func- 
tions. This is done in Chapter 18. 

In a similar manner, the confluent hypergeometric equation is taken as the canonical 
form of a linear second-order differential equation with one regular and one irregular sin- 
gular point. 


Table 7.1 Singularities of Some Important ODEs. 

















Equation Regular Irregular 
Singularity Singularity 
t= x= 

1. Hypergeometric 0, 1, 00 
x(x — ly” +[U+a+b)x+c]y’ + aby=0 

2. Legendre“ —1,1,00 
(1—x?)y"” —2xy’ +10 + Dy =0 

3. Chebyshev —1,1,00 
(1—x?)y"” — xy! +n?y =0 

4. Confluent hypergeometric 0 ioe) 
xy" + (c—x)y/-ay=0 

5. Bessel 0 oo 
x2y" Hi xy! = (x2 = n)y =0 

6. Laguerre“ 0 (oe) 
xy" +(1—x)y’+ay =0 

7. Simple harmonic oscillator tee oo 
y! mi wy =0 

8. Hermite nee oo 


y” —2xy’! + 2ay =0 





“The associated equations have the same singular points. 





346 Chapter 7 Ordinary Differential Equations 
Exercises 
7.41 Show that Legendre’s equation has regular singularities at x = —1, 1, and oo. 
7.4.2 Show that Laguerre’s equation, like the Bessel equation, has a regular singularity at 
x = 0 and an irregular singularity at x = oo. 
7.4.3 Show that Chebyshev’s equation, like the Legendre equation, has regular singularities 
atx =—1, 1, and oo. 
7.4.4 Show that Hermite’s equation has no singularity other than an irregular singularity at 
X = 00. 
7.4.5 Show that the substitution 


l1-x 
x> 7 a=-l, b=l4+1, c=1 


converts the hypergeometric equation into Legendre’s equation. 





7.5 SERIES SOLUTIONS—FROBENIUS’ METHOD 


In this section we develop a method of obtaining solution(s) of the linear, second-order, 
homogeneous ODE. For the moment, we develop the mechanics of the method. After 
studying examples, we return to discuss the conditions under which we can expect these 
series solutions to exist. 

Consider a linear, second-order, homogeneous ODE, in the form 


d’y 

dx Aged. 
In this section we develop (at least) one solution of Eq. (7.23) by expansion about the point 
x = 0. In the next section we develop the second, independent solution and prove that 
no third, independent solution exists. Therefore the most general solution of Eq. (7.23) 
may be written in terms of the two independent solutions as 


+ Poy? + O(x)y =0. (7.23) 


y(x) =c1yi (x) + c2y2(e). (7.24) 
Our physical problem may lead to a nonhomogeneous, linear, second-order ODE, 
d*y 
qt + Py? + O(x)y = F(x). (7.25) 


The function on the right, F(x), typically represents a source (such as electrostatic charge) 
or a driving force (as in a driven oscillator). Methods for solving this inhomogeneous 
ODE are also discussed later in this chapter and, using Laplace transform techniques, in 
Chapter 20. Assuming a single particular integral (i.e., specific solution), y,, of the in- 
homogeneous ODE to be available, we may add to it any solution of the corresponding 
homogeneous equation, Eq. (7.23), and write the most general solution of Eq. (7.25) as 


y(x) = cy (x) + cry2(x) + yp (x). (7.26) 


In many problems, the constants cy and c2 will be fixed by boundary conditions. 
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For the present, we assume that F(x) = 0, and that therefore our differential equation is 
homogeneous. We shall attempt to develop a solution of our linear, second-order, homoge- 
neous differential equation, Eq. (7.23), by substituting into it a power series with undeter- 
mined coefficients. Also available as a parameter is the power of the lowest nonvanishing 
term of the series. To illustrate, we apply the method to two important differential equa- 
tions. 


First Example—Linear Oscillator 


Consider the linear (classical) oscillator equation 
2y 

dx dy2 

which we have already solved by another method in Example 7.3.1. The solutions we 


found there were y = sinwx and coswx. 
We try 


* +a*y=0, (7.27) 


y(x) =x* (ag + a,x + anx” +a3x? +--+) 
ia . 
= Dax, ao #0. (7.28) 


with the exponent s and all the coefficients a; still undetermined. Note that s need not be 
an integer. By differentiating twice, we obtain 


dy _ — \) s+j—-1 
— =) aj(stj)x , 


dx 
j=0 


CO 
a De jis + fst j— Dx, 


By substituting into Eq. (7.27), we have 

[o,@) CO 
Yoaji(st pet j—-Dxt 7? +0° So ajxt =0. (7.29) 
j=0 j=0 


From our analysis of the uniqueness of power series (Chapter 1), we know that the coef- 
ficient of each power of x on the left-hand side of Eq. (7.29) must vanish individually, x* 
being an overall factor. 

The lowest power of x appearing in Eq. (7.29) is x* ~~, occurring only for j = 0 in the 
first summation. The requirement that this coefficient vanish yields 


—2 


aos(s — 1) =0. 


Recall that we chose ag as the coefficient of the lowest nonvanishing term of the series in 
Eq. (7.28), so that, by definition, ag 4 0. Therefore we have 


s(s—1)=0. (7.30) 
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This equation, coming from the coefficient of the lowest power of x, is called the indicial 
equation. The indicial equation and its roots are of critical importance to our analysis. 
Clearly, in this example it informs us that either s = 0 or s = 1, so that our series solution 
must start either with an x° or an x! term. 

Looking further at Eq. (7.29), we see that the next lowest power of x, namely x*~!, also 
occurs uniquely (for j = 1 in the first summation). Setting the coefficient of x°~! to zero, 
we have 


ay(s+1)s=0. 


This shows that if s = 1, we must have a; = 0. However, if s = 0, this equation imposes 
no requirement on the coefficient set. 

Before considering further the two possibilities for s, we return to Eq. (7.29) and demand 
that the remaining net coefficients vanish. The contributions to the coefficient of x°*/, 
(j = 0), come from the term containing a;+2 in the first summation and from that with a; 
in the second. Because we have already dealt with 7 = 0 and j = 1 in the first summation, 
when we have used all j > 0, we will have used all the terms of both series. For each value 
of j, the vanishing of the net coefficient of x°*/ results in 


ajpo(s + f+2)(s+j+1)+o%a; =0, 


equivalent to 


w 


aj - 
4st f4+24+74+D 





ajw2= (7.31) 


This is a two-term recurrence relation.' In the present problem, given aj, Eq. (7.31) 
permits us to compute aj+2 and then aj+4, aj+6, and so on, continuing as far as desired. 
Thus, if we start with aj, we can make the even coefficients a2, a4, ..., but we obtain no 
information about the odd coefficients a1, a3, a5, .... But because a is arbitrary if s = 0 
and necessarily zero if s = 1, let us set it equal to zero, and then, by Eq. (7.31), 





a3 = a5 =a7 =---=0; 


the result is that all the odd-numbered coefficients vanish. 
Returning now to Eq. (7.30), our indicial equation, we first try the solution s = 0. The 
recurrence relation, Eq. (7.31), becomes 


w 


iGEDGHD’ 





aj42= (7.32) 


'Tn some problems, the recurrence relation may involve more than two terms; its exact form will depend on the functions P(x) 
and Q(x) of the ODE. 
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which leads to 





w w 
az=— Ta 0; 
wo w4 
a4 = —ay 34 te ay 0 
wo w® 
= —44 = — Gao and so on. 
By inspection (and mathematical induction, see Section 1.4), 
2n 
@2n = (—1)" Oni” (7.33) 


and our solution is 





(ox)? (wx)* — (wx)® 
y(X)s=0 = a0 1 oy + rm bi +--+ |=agcosax. (7.34) 
If we choose the indicial equation root s = 1 from Eq. (7.30), the recurrence relation of 
Eq. (7.31) becomes 





ae 
aj42 = -—aj— - , (7.35) 
: GaGa) 
Evaluating this successively for j = 0, 2, 4,..., we obtain 
7 wo 7 ie 
a= nS a = ar 
wo wt 
Bee a 
w w° 
a= —44 = — ay 40s and so on. 
Again, by inspection and mathematical induction, 
2n 
o 
= (—1)”———a. 7.36 
an = (—1) (Qn+ TR Neg ( ) 


For this choice, s = 1, we obtain 





2 4 6 
y(x)s=1 =aox 1 (ox) + (wx) (wx) ads | 


3! 5! 7! 


ag (ox)? (wx)? (wx)! 
= 2 | (ox) a SI 7! +] 





a 


=~ sinwx. (7.37) 
@ 
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ill IV 
ay(k+2)(k+1) | xk+| ag(k+3)(k+2) 







































































FiGURE 7.2 Schematics of series solution. 


For future reference we note that the ODE solution from the indicial equation root s = 0 
consisted only of even powers of x, while the solution from the root s = 1 contained only 
odd powers. 


To summarize this approach, we may write Eq. (7.29) schematically as shown in 
Fig. 7.2. From the uniqueness of power series (Section 1.2), the total coefficient of 
each power of x must vanish—all by itself. The requirement that the first coefficient 
vanish (I) leads to the indicial equation, Eq. (7.30). The second coefficient is han- 
dled by setting a, =0 (II). The vanishing of the coefficients of x*° (and higher pow- 
ers, taken one at a time) is ensured by imposing the recurrence relation, Eq. (7.31), 
(Il), (IV). 


This expansion in power series, known as Frobenius’ method, has given us two series 
solutions of the linear oscillator equation. However, there are two points about such series 
solutions that must be strongly emphasized: 


1. The series solution should always be substituted back into the differential equation, to 
see if it works, as a precaution against algebraic and logical errors. If it works, it is a 
solution. 

2. The acceptability of a series solution depends on its convergence (including asymp- 
totic convergence). It is quite possible for Frobenius’ method to give a series solution 
that satisfies the original differential equation when substituted in the equation but 
that does not converge over the region of interest. Legendre’s differential equation 
(examined in Section 8.3) illustrates this situation. 


Expansion about xo 


Equation (7.28) is an expansion about the origin, xo = 0. It is perfectly possible to replace 
Eq. (7.28) with 


y(x) =} aj(x—x0)t/, a9 £0. (7.38) 
j=0 


Indeed, for the Legendre, Chebyshev, and hypergeometric equations, the choice x9 = 1 
has some advantages. The point xo should not be chosen at an essential singularity, or 
Frobenius’ method will probably fail. The resultant series (xp an ordinary point or regular 
singular point) will be valid where it converges. You can expect a divergence of some sort 
when |x — xo| = |z1 — xo|, where z, is the ODE’s closest singularity to xo (in the complex 
plane). 
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Symmetry of Solutions 


Let us note that for the classical oscillator problem we obtained one solution of even sym- 
metry, yj (x) = y;(—x), and one of odd symmetry, y2(x) = —y2(—x). This is not just an 
accident but a direct consequence of the form of the ODE. Writing a general homogeneous 
ODE as 


L(x) y(x) = 0, (7.39) 


in which £(x) is the differential operator, we see that for the linear oscillator equation, 
Eq. (7.27), £(x) is even under parity; that is, 


L(x) = L(-x). 


Whenever the differential operator has a specific parity or symmetry, either even or odd, 
we may interchange +x and —x, and Eq. (7.39) becomes 


+L(x)y(—x) =0. 


Clearly, if y(x) is a solution of the differential equation, y(—x) is also a solution. Then, 
either y(x) and y(—x) are linearly dependent (i.e., proportional), meaning that y is either 
even or odd, or they are linearly independent solutions that can be combined into a pair of 
solutions, one even, and one odd, by forming 


Yeven = W(X) + Y(—X), Youd = W(X) — y(—x). 


For the classical oscillator example, we obtained two solutions; our method for finding 
them caused one to be even, the other odd. 

If we refer back to Section 7.4 we can see that Legendre, Chebyshev, Bessel, simple har- 
monic oscillator, and Hermite equations are all based on differential operators with even 
parity; that is, their P(x) in Eq. (7.19) is odd and Q(x) even. Solutions of all of them 
may be presented as series of even powers of x or separate series of odd powers of x. 
The Laguerre differential operator has neither even nor odd symmetry; hence its solutions 
cannot be expected to exhibit even or odd parity. Our emphasis on parity stems primarily 
from the importance of parity in quantum mechanics. We find that in many problems wave 
functions are either even or odd, meaning that they have a definite parity. Most interac- 
tions (beta decay is the big exception) are also even or odd, and the result is that parity is 
conserved. 


A Second Example—Bessel’s Equation 


This attack on the linear oscillator was perhaps a bit too easy. By substituting the power 
series, Eq. (7.28), into the differential equation, Eq. (7.27), we obtained two independent 
solutions with no trouble at all. 

To get some idea of other things that can happen, we try to solve Bessel’s equation, 


x7y" 4 xy! + (x2 —n*)y =0. (7.40) 
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Again, assuming a solution of the form 


[ee 


w= > aye, 


j=0 


we differentiate and substitute into Eq. (7.40). The result is 


CO Co 
Das t+ Aie+j—De + Pals + pat 
j=0 j=0 
he got? — ye ntl = (7.41) 


By setting 7 = 0, we get the coefficient of x*°, the lowest power of x appearing on the 
left-hand side, 


ao[s(s — 1) +s —n*] =0, (7.42) 
and again aj # 0 by definition. Equation (7.42) therefore yields the indicial equation 
s*—n’ =0, (7.43) 


with solutions s = En. 
We need also to examine the coefficient of x5*+!. Here we obtain 


alist ls+s+1—n7]=0, 
or 
ai(s+1l—n)(s+1+n)=0. (7.44) 


For s = +n, neither s+ 1 —n nor s + 1-+7 vanishes and we must require a; = 0. 

Proceeding to the coefficient of x**/ for s =n, we see that it is the term containing a; 
in the first, second, and fourth terms of Eq. (7.41), but is that containing a;—2 in the third 
term. By requiring the overall coefficient of x°*/ to vanish, we obtain 





ajl(nt fat j—I+ (nt j)—n*]+a4;-2=0. 
When j is replaced by j + 2, this can be rewritten for j > 0 as 
1 
aj . . ’ 
“(j +2)(2n4+ j +2) 


which is the desired recurrence relation. Repeated application of this recurrence relation 
leads to 





aji2= (7.45) 








1 don! 
oO On+2) Pine!’ 
1 agn! 
4 21 On+4) BAn+2)!' 
1 aon! 
a= —a = and so on, 





*6(2n+6) 2631(n +3)! 
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and in general, 


aon! 
22P p\(n+ p)! 





aap = (—1)? (7.46) 


Inserting these coefficients in our assumed series solution, we have 


n!x 45 nix* 7.47) 
271'(n +1)! 242!(n +2)! . Me 


2 





y(x) = aox" 1 


In summation form, 








7 = 7 ni xt +2) 
as '" Pija+ py! 
n nee) 
ae nde yi Yaa G ) ; (7.48) 


In Chapter 14 the final summation (with ag = 1/2”n!) is identified as the Bessel function 
Jy (x): 





eer ed 4 a ae (7.49) 


Note that this solution, J, (x), has either even or odd symmetry,” as might be expected 
from the form of Bessel’s equation. 

When s = —n and n is not an integer, we may generate a second distinct series, to be 
labeled J_, (x). However, when —n is a negative integer, trouble develops. The recurrence 
relation for the coefficients a; is still given by Eq. (7.45), but with 2n replaced by —2n. 
Then, when j + 2 = 2n or j = 2(n — 1), the coefficient aj42 blows up and Frobenius’ 
method does not produce a solution consistent with our assumption that the series starts 
with x~”. 

By substituting in an infinite series, we have obtained two solutions for the linear oscil- 
lator equation and one for Bessel’s equation (two if n is not an integer). To the questions 
“Can we always do this? Will this method always work?” the answer is “No, we cannot 
always do this. This method of series solution will not always work.” 


Regular and Irregular Singularities 


The success of the series substitution method depends on the roots of the indicial equation 
and the degree of singularity of the coefficients in the differential equation. To understand 
better the effect of the equation coefficients on this naive series substitution approach, 


21, (x) is an even function if n is an even integer, and an odd function if n is an odd integer. For nonintegral n, J, has no such 
simple symmetry. 
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consider four simple equations: 


6 
y= 29 =0, (7.50) 
6 
y"—~y=0, (7.51) 
Xx 
” 1 / be at 
Poe y= gy =O (7.52) 
W 1 if: b? 
y + ay — 39 =0. (7.53) 
The reader may show easily that for Eq. (7.50) the indicial equation is 
s*—s—6=0, 
giving s = 3 and s = —2. Since the equation is homogeneous in x (counting d?/dx? as 


x7), there is no recurrence relation. However, we are left with two perfectly good solu- 
tions, x? and x~?. 

Equation (7.51) differs from Eq. (7.50) by only one power of x, but this sends the indicial 
equation to 


—6ap = 0, 


with no solution at all, for we have agreed that ap 4 0. Our series substitution worked for 
Eq. (7.50), which had only a regular singularity, but broke down at Eq. (7.51), which has 
an irregular singular point at the origin. 

Continuing with Eq. (7.52), we have added a term y’/x. The indicial equation is 


s*—b? =0, 


but again, there is no recurrence relation. The solutions are y = x” and x~?, both perfectly 
acceptable one-term series. 

When we change the power of x in the coefficient of y’ from —1 to —2, in Eq. (7.53), 
there is a drastic change in the solution. The indicial equation (with only the y’ term con- 
tributing) becomes 








s=0. 
There is a recurrence relation, 
b-jG-)) 
aj4. = +a; ————_.. 
J+ J j +1 
Unless the parameter b is selected to make the series terminate, we have 
_ | Gj+l — JG+t) 
lim }|——}|= lim —W— 
jro| aj joo jtl 
li E 
= im —-=O. 
jroo J 
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Hence our series solution diverges for all x 4 0. Again, our method worked for 
Eq. (7.52) with a regular singularity but failed when we had the irregular singularity of 
Eq. (7.53). 


Fuchs’ Theorem 


The answer to the basic question as to when the method of series substitution can be 
expected to work is given by Fuchs’ theorem, which asserts that we can always obtain 
at least one power-series solution, provided that we are expanding about a point which is 
an ordinary point or at worst a regular singular point. 

If we attempt an expansion about an irregular or essential singularity, our method may 
fail as it did for Eqs. (7.51) and (7.53). Fortunately, the more important equations of mathe- 
matical physics, listed in Section 7.4, have no irregular singularities in the finite plane. 
Further discussion of Fuchs’ theorem appears in Section 7.6. 

From Table 7.1, Section 7.4, infinity is seen to be a singular point for all the equations 
considered. As a further illustration of Fuchs’ theorem, Legendre’s equation (with infinity 
as a regular singularity) has a convergent series solution in negative powers of the argu- 
ment (Section 15.6). In contrast, Bessel’s equation (with an irregular singularity at infinity) 
yields asymptotic series (Sections 12.6 and 14.6). Although only asymptotic, these solu- 
tions are nevertheless extremely useful. 


Summary 


If we are expanding about an ordinary point or at worst about a regular singularity, the 
series substitution approach will yield at least one solution (Fuchs’ theorem). 

Whether we get one or two distinct solutions depends on the roots of the indicial 
equation. 


1. Ifthe two roots of the indicial equation are equal, we can obtain only one solution by 
this series substitution method. 

2. If the two roots differ by a nonintegral number, two independent solutions may be 
obtained. 

3. Ifthe two roots differ by an integer, the larger of the two will yield a solution, while the 
smaller may or may not give a solution, depending on the behavior of the coefficients. 


The usefulness of a series solution for numerical work depends on the rapidity of con- 
vergence of the series and the availability of the coefficients. Many ODEs will not yield 
nice, simple recurrence relations for the coefficients. In general, the available series will 
probably be useful for very small |x| (or |x — xo|). Computers can be used to determine 
additional series coefficients using a symbolic language, such as Mathematica? or Maple.* 
Often, however, for numerical work a direct numerical integration will be preferred. 





3§. Wolfram, Mathematica: A System for Doing Mathematics by Computer. Reading, MA. Addison Wesley (1991). 
44. Heck, Introduction to Maple. New York: Springer (1993). 
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Exercises 


7.5.1 


7.5.2 


Uniqueness theorem. The function y(x) satisfies a second-order, linear, homogeneous 
differential equation. At x = xo, y(x) = yo and dy/dx = yg. Show that y(x) is unique, 
in that no other solution of this differential equation passes through the points (x0, yo) 
with a slope of y6. 


Hint. Assume a second solution satisfying these conditions and compare the Taylor 
series expansions. 


A series solution of Eq. (7.23) is attempted, expanding about the point x = xo. If xo is 
an ordinary point, show that the indicial equation has roots s = 0, 1. 


In the development of a series solution of the simple harmonic oscillator (SHO) equa- 
tion, the second series coefficient aj was neglected except to set it equal to zero. From 
the coefficient of the next-to-the-lowest power of x, xsTh develop a second-indicial 
type equation. 


(a) (SHO equation with s = 0). Show that aj, may be assigned any finite value 
(including zero). 

(b) (SHO equation with s = 1). Show that a; must be set equal to zero. 

Analyze the series solutions of the following differential equations to see when a, may 


be set equal to zero without irrevocably losing anything and when a; must be set equal 
to zero. 


(a) Legendre, (b) Chebyshev, (c) Bessel, (d) Hermite. 


ANS. (a) Legendre, (b) Chebyshev, and (d) Hermite: For s=0, a, 


may be set equal to zero; for s = 1, aj must be set equal to zero. 


(c) Bessel: a; must be set equal to zero (except for s = =n = —5 ; 


Obtain a series solution of the hypergeometric equation 
x(x — Dy" +[(l+a+t+b)x —cly’ + aby =0. 
Test your solution for convergence. 
Obtain two series solutions of the confluent hypergeometric equation 
xy" +(c—x)y’ —ay=0. 
Test your solutions for convergence. 


A quantum mechanical analysis of the Stark effect (parabolic coordinates) leads to the 
differential equation 


d pd m ley m Lg? ‘ 

— |é—- ~ a—-— = u=0. 

dé dé 2 4E 4 
Here a@ is a constant, E is the total energy, and F is a constant such that Fz is the 
potential energy added to the system by the introduction of an electric field. 
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7.5.9 
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Using the larger root of the indicial equation, develop a power-series solution about 
& = 0. Evaluate the first three coefficients in terms of ay. 


ANS. Indicial equation s?— “ =0, 


2 


2 ee % 
2m + Dm +2) maa |*+ | 








wtey= ane”? aR +| 


Note that the perturbation F does not appear until a3 is included. 


For the special case of no azimuthal dependence, the quantum mechanical analysis of 
the hydrogen molecular ion leads to the equation 
d du 
a= ayey—— + Bn7u =0. 
ie [ ro | au Byou 


Develop a power-series solution for u(7). Evaluate the first three nonvanishing coeffi- 
cients in terms of do. 


ANS. Indicial equation s(s —1)=0, 





2-a (2—a)(12—-a) B] 4 

f= 14 72 wok. 

Uk=1 con =F 6 7 +| 120 0 n+ 

To a good approximation, the interaction of two nucleons may be described by a 


mesonic potential 





attractive for A negative. Show that the resultant Schrédinger wave equation 


i? dw 
—_ ——_ E=— = 
2m dx2 +¢ Vee 


has the following series solution through the first three nonvanishing coefficients: 





1 1f1 
Wa an xt sa's 4 2] 542 E' ad’ |o +o}, 


where the prime indicates multiplication by 2m /h?. 


If the parameter b? in Eq. (7.53) is equal to 2, Eq. (7.53) becomes 


Gi eh cae, 
x2 x? : 
From the indicial equation and the recurrence relation, derive a solution y = 1 + 2x + 
2x*. Verify that this is indeed a solution by substituting back into the differential 
equation. 
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7.5.11 The modified Bessel function Jg(x) satisfies the differential equation 


2 


d d 
x? lo) + x =~ Io(x) — x7 Ip(x) =0. 


Given that the leading term in an asymptotic expansion is known to be 


x 








e 
Io(x) ~ ‘ 
i. VJ 20x 
assume a series of the form 
e 1 2 
Ina) ~ [i+ bi 4 byx7 ted 
V 20x 


Determine the coefficients b; and bo. 
ANS. bi=q3, b= TR. 


7.5.12 |The even power-series solution of Legendre’s equation is given by Exercise 8.3.1. Take 
ao = | and n not an even integer, say n = 0.5. Calculate the partial sums of the series 
through 200 | 4-400 600, x29 for x = 0.95(0.01)1.00. Also, write out the individ- 
ual term corresponding to each of these powers. 


Note. This calculation does not constitute proof of convergence at x = 0.99 or diver- 
gence at x = 1.00, but perhaps you can see the difference in the behavior of the sequence 
of partial sums for these two values of x. 


7.5.13 (a) The odd power-series solution of Hermite’s equation is given by Exercise 8.3.3. 
Take ap = 1. Evaluate this series for a = 0, x = 1, 2,3. Cut off your calculation 
after the last term calculated has dropped below the maximum term by a factor of 
10° or more. Set an upper bound to the error made in ignoring the remaining terms 
in the infinite series. 


(b) Asacheck on the calculation of part (a), show that the Hermite series yoqq(a@ = 0) 
corresponds to fy exp(x”)dx. 


(c) Calculate this integral for x = 1, 2, 3. 


7.6 OTHER SOLUTIONS 


In Section 7.5 a solution of a second-order homogeneous ODE was developed by substi- 
tuting in a power series. By Fuchs’ theorem this is possible, provided the power series is 
an expansion about an ordinary point or a nonessential singularity.> There is no guarantee 
that this approach will yield the two independent solutions we expect from a linear second- 
order ODE. In fact, we shall prove that such an ODE has at most two linearly independent 
solutions. Indeed, the technique gave only one solution for Bessel’s equation (n an integer). 
In this section we also develop two methods of obtaining a second independent solution: 
an integral method and a power series containing a logarithmic term. First, however, we 
consider the question of independence of a set of functions. 


5This is why the classification of singularities in Section 7.4 is of vital importance. 
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Linear Independence of Solutions 


In Chapter 2 we introduced the concept of linear dependence for forms of the type ajx; + 
a2x2+..., and identified a set of such forms as linearly dependent if any one of the forms 
could be written as a linear combination of others. We need now to extend the concept to 
a set of functions g,. The criterion for linear dependence of a set of functions of a variable 
x is the existence of a relation of the form 


> kaga(x) =0, (7.54) 
Xr 


in which not all the coefficients ky, are zero. The interpretation we attach to Eq. (7.54) is that 
it indicates linear dependence if it is satisfied for all relevant values of x. Isolated points or 
partial ranges of satisfaction of Eq. (7.54) do not suffice to indicate linear dependence. The 
essential idea being conveyed here is that if there is linear dependence, the function space 
spanned by the g(x) can be spanned using less than all of them. On the other hand, if the 
only global solution of Eq. (7.54) is k, =0 for all A, the set of functions gy (x) is said to 
be linearly independent. 

If the members of a set of functions are mutually orthogonal, then they are automatically 
linearly independent. To establish this, consider the evaluation of 


for a set of orthonormal g, and with arbitrary values of the coefficients k,. Because of the 
orthonormality, S evaluates to )7, |k, |, and will be nonzero (showing that )~ a kaPr #0) 
unless all the k, vanish. 

We now proceed to consider the ramifications of linear dependence for solutions of 
ODEs, and for that purpose it is appropriate to assume that the functions g, (x) are differ- 
entiable as needed. Then, differentiating Eq. (7.54) repeatedly, with the assumption that it 
is valid for all x, we generate a set of equations 


iki x) = 0, 
Xr 





Yogi) =0, 
Xr 


continuing until we have generated as many equations as the number of 4 values. This 
gives us a set of homogeneous linear equations in which k, are the unknown quantities. 
By Section 2.1 there is a solution other than all k, = 0 only if the determinant of the 
coefficients of the k,, vanishes. This means that the linear dependence we have assumed by 
accepting Eq. (7.54) implies that 


P1 $2 -++ On 


gy) Q ees gy! 
' "  )=0. (7.55) 


(n—1) 
n 


7 


-1 -1 
‘e ) gs" ) 9g 
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This determinant is called the Wronskian, and the analysis leading to Eq. (7.55) shows 
that: 


1. Ifthe Wronskian is not equal to zero, then Eq. (7.54) has no solution other than k,, = 0. 
The set of functions g) is therefore linearly independent. 


2. Ifthe Wronskian vanishes at isolated values of the argument, this does not prove linear 
dependence. However, if the Wronskian is zero over the entire range of the variable, 
the functions y, are linearly dependent over this range.° 


Example 7.6.1 LINEAR INDEPENDENCE 


The solutions of the linear oscillator equation, Eq. (7.27), are g, = sinwx, g2 = cosa@x. 
The Wronskian becomes 
sin wx COS Wx 


; =-w #0. 

@WCOS@x —wsinwx 

These two solutions, ~; and @, are therefore linearly independent. For just two functions 

this means that one is not a multiple of the other, which is obviously true here. 
Incidentally, you know that 





sinwx = +(1 — cos” ax)'/?, 


but this is not a linear relation, of the form of Eq. (7.54). a 


Example 7.6.2 LINEAR DEPENDENCE 


For an illustration of linear dependence, consider the solutions of the ODE 


d? (x) 
7 = (x). 
This equation has solutions yg; = e* and g2 = e~*, and we add ¢3 = coshx, also a solution. 
The Wronskian is 
e* e* coshx 
e~ —e*  sinhx|=0. 
e* e* coshx 


The determinant vanishes for all x because the first and third rows are identical. Hence 


e*, e *, and coshx are linearly dependent, and, indeed, we have a relation of the form of 
Eq. (7.54): 


e+e *—2coshx=0 with k, 40. 
a 





©Compare H. Lass, Elements of Pure and Applied Mathematics, New York: McGraw-Hill (1957), p. 187, for proof of this 
assertion. It is assumed that the functions have continuous derivatives and that at least one of the minors of the bottom row of 
Eq. (7.55) (Laplace expansion) does not vanish in [a, b], the interval under consideration. 
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Number of Solutions 


Now we are ready to prove the theorem that a second-order homogeneous ODE has two 
linearly independent solutions. 

Suppose yj, y2, y3 are three solutions of the homogeneous ODE, Eq. (7.23). Then we 
form the Wronskian Wj, = yj y;, — yj yx of any pair y;, yx of them and note also that 


Win = (ViVe + VANE) — VV + Y/Y) 
= ViVi — Vik (7.56) 
Next we divide the ODE by y and move Q(x) to its right-hand side (where it becomes 
—Q(x)), so, for solutions y; and yx: 
y!? y’ " lk 
E+ Pyt =-9(r) = £ + Poy. 
Yk Yk 


yj yj 
Taking now the first and third members of this equation, multiplying by y; yx and rearrang- 
ing, we find that 

(j¥e — YE VK) + PVN — Vik) = 0, 


which simplifies for any pair of solutions to 


Win = —P(x) Wik. (7.57) 


Finally, we evaluate the Wronskian of all three solutions, expanding it by minors along the 
second row and identifying each term as containing a Wii as given by Eq. (7.56): 


Yi 2 3 
W=|y, yo ¥3)=—¥1Wo3 + y2Wi3 — ¥3Wyo- 
Yi 2 YZ 
We now use Eq. (7.57) to replace each Wii by —P(x)W;; and then reassemble the minors 
into a 3 x 3 determinant, which vanishes because it contains two identical rows: 


yi yo 3 
W = P(x) (y Wo3 — yy Wi3 + y3Wi2) = P(x) |y, ys ys] =0. 
Yi ¥y Y5 


We therefore have W = 0, which is just the condition for linear dependence of the solutions 
y;. Thus, we have proved the following: 


A linear second-order homogeneous ODE has at most two linearly independent solu- 
tions. Generalizing, a linear homogeneous nth-order ODE has at most n linearly inde- 
pendent solutions y;, and its general solution will be of the form y(x) = Viel cjyj (x). 





362 Chapter 7 Ordinary Differential Equations 
Finding a Second Solution 


Returning to our linear, second-order, homogeneous ODE of the general form 
y" + P(x)y' + O(x)y =0, (7.58) 
let yj and y2 be two independent solutions. Then the Wronskian, by definition, is 
W = yiy) — 1y2- (7.59) 
By differentiating the Wronskian, we obtain, as already demonstrated in Eq. (7.57), 
W’ = —P(x)W. (7.60) 
In the special case that P(x) = 0, that is, 
y" + Q(x)y =0, (7.61) 
the Wronskian 
W = yi ys — y| y2 = constant. (7.62) 


Since our original differential equation is homogeneous, we may multiply the solutions y, 
and yz by whatever constants we wish and arrange to have the Wronskian equal to unity 
(or —1). This case, P(x) = 0, appears more frequently than might be expected. Recall that 
V*(w/r) in spherical polar coordinates contains no first radial derivative. Finally, every 
linear second-order differential equation can be transformed into an equation of the form 
of Eq. (7.61) (compare Exercise 7.6.12). 

For the general case, let us now assume that we have one solution of Eq. (7.58) by a 
series substitution (or by guessing). We now proceed to develop a second, independent 
solution for which W 4 0. Rewriting Eq. (7.60) as 


dW 
—__ = —Pdx, 
W 


we integrate over the variable x, from a to x, to obtain 


x 


Ww 
‘ a Ss -| P(x)dz1, 





a 


or 
W(x) = W(a) exp - f Ponds; ‘ (7.63) 


a 


Tif P(x) remains finite in the domain of interest, W(x) 4 0 unless W(a) = 0. That is, the Wronskian of our two solutions is 
either identically zero or never zero. However, if P(x) does not remain finite in our interval, then W(x) can have isolated zeros 
in that domain and one must be careful to choose a so that W(a) 40. 
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Now we make the observation that 
2d [(y2 
W(x) = y1¥5 — ¥iy2 = Yi — ax (2). (7.64) 


and, by combining Eqs. (7.63) and (7.64), we have 








d exp[— [> P(x1)d 
(2)= wia) pl-f- = a] (7.65) 
dx yy 
Finally, by integrating Eq. (7.65) from x2 = b to x2 = x we get 
x2 
exp[ P(x1)dx1 
ya(x) = nomen | [=a Jaxy, (7.66) 
[yi (2) /? 


Here a and b are arbitrary constants and a term y;(x)y2(b)/y1(b) has been dropped, 
because it is a multiple of the previously found first solution y;. Since W(a), the Wronskian 
evaluated at x = a, is a constant and our solutions for the homogeneous differential equa- 
tion always contain an arbitrary normalizing factor, we set W(a) = 1 and write 


[— f°? P(x1)dx1] 
[yi (x2)? 





po=a@ = ie. (7.67) 


Note that the lower limits x; = a and x2 = b have been omitted. If they are retained, 
they simply make a contribution equal to a constant times the known first solution, y1 (x), 
and hence add nothing new. If we have the important special case P(x) = 0, Eq. (7.67) 
reduces to 


new [= (7.68) 
Ty. 


This means that by using either Eq. (7.67) or Eq. (7.68) we can take one known solution and 
by integrating can generate a second, independent solution of Eq. (7.58). This technique is 
used in Section 15.6 to generate a second solution of Legendre’s differential equation. 


Example 7.6.3 A SECOND SOLUTION FOR THE LINEAR OSCILLATOR EQUATION 


From d*y/dx* + y =0 with P(x) = 0 let one solution be y; = sinx. By applying 
Eq. (7.68), we obtain 





: dx . 
yo(x) = sinx >a = sin x(—cotx) = —cos x, 
sin” x2 


which is clearly independent (not a linear multiple) of sin x. | 
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Series Form of the Second Solution 


Further insight into the nature of the second solution of our differential equation may be 
obtained by the following sequence of operations. 


1. Express P(x) and Q(x) in Eq. (7.58) as 


POS pe, O)= >" ax" (7.69) 


i=—1 j==2 


The leading terms of the summations are selected to create the strongest possible 
regular singularity (at the origin). These conditions just satisfy Fuchs’ theorem and 
thus help us gain a better understanding of that theorem. 


2. Develop the first few terms of a power-series solution, as in Section 7.5. 
3. Using this solution as y;, obtain a second series-type solution, y2, from Eq. (7.67), by 


integrating it term by term. 
Proceeding with Step 1, we have 
y" + (paix! + pot pix t-+)y + (q-ox? +q-ix | +--)y =0, (7.70) 


where x = 0 is at worst a regular singular point. If p_1 = g_-1 = g_2 = 0, it reduces to an 
ordinary point. Substituting 


foe) 
=0 


(Step 2), we obtain 


CO oe) 00 
Dita FA— Dart? + pix! Dos Dax 
A=0 i=-1 1=0 


foe) Co 
+ >> ajxi ax’ =0. (7.71) 
j=-2 A=0 


Assuming that p_; 4 0, our indicial equation is 
s(s —1)4+ p-1k+ q_2=0, 
which sets the net coefficient of x°~? equal to zero. This reduces to 
s* + (p_1 — ls +g-2=0. (772) 


We denote the two roots of this indicial equation by s = a and s = a — n, where n is zero 
or a positive integer. (If n is not an integer, we expect two independent series solutions by 
the methods of Section 7.5 and we are done.) Then 


(s—a)(s -—a+n)=0, (7.73) 
or 


s?+(n—2a)s +a(a—n)=0, 
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and equating coefficients of s in Eqs. (7.72) and (7.73), we have 
p-1—l=n- 2a. (7.74) 


The known series solution corresponding to the larger root s = a may be written as 


foe) 
yy =x% me ax’. 
A=0 


Substituting this series solution into Eq. (7.67) (Step 3), we are faced with 


x pz yr00 i 
rae nen f (24 Ja PE a, (7.75) 


Poa) 


where the solutions y; and yz have been normalized so that the Wronskian W(a) = 1. 
Tackling the exponential factor first, we have 





X2 a9 


CO 
i k 
Y~ pixidxy = p-1Inx2 + er + f@. (7.76) 
k=0 





a t= 


with f(a) an integration constant that may depend on a. Hence, 


x2 oO 
exp | — i > pixidxy = exp[—f (a)]x,?"! exp (- Pk os") 
i 








Se 
a k=0 
“Pp 1{ So pe ° 
—p- k 
=exp[—f(@]x, "| 1- ie ta (- fo" SSeelle TT) 
k=0 ‘ k=0 


This final series expansion of the exponential is certainly convergent if the original expan- 
sion of the coefficient P(x) was uniformly convergent. 
The denominator in Eq. (7.75) may be handled by writing 


=] 2 


CO 2 (oe) ra Co 
eid (> ot) =x, (> ot) =a ie. (7.78) 
A=0 A=0 A=0 


Neglecting constant factors, which will be picked up anyway by the requirement that 
W (a) = 1, we obtain 


yo(x) = y1 ale 3 asi) dx. (7.79) 
A=0 


Applying Eq. (7.74), 


a Se (7.80) 
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and we have assumed here that n is an integer. Substituting this result into Eq. (7.79), we 
obtain 





x 
yaa) = yi) f (cosy?! beaxy” teary bt onty! +--+) da, (7.81) 


The integration indicated in Eq. (7.81) leads to a coefficient of yi(x) consisting of two 
parts: 


1. A power series starting with x~”. 


2. A logarithm term from the integration of x~! (when 4 = 7). This term always appears 
when n is an integer, unless c, fortuitously happens to vanish.® 


=n 


If we choose to combine y, and the power series starting with x~”, our second solution 


will assume the form 


y2(x) = yi(x) In |x| + > djxi**. (7.82) 


jaan 


Example 7.6.4 A SECOND SOLUTION OF BESSEL’S EQUATION 


From Bessel’s equation, Eq. (7.40), divided by x? to agree with Eq. (7.59), we have 
P(x)=x~! Q(x)=1 forthecase n=O. 


Hence p_; = 1, go = 1; all other p; and q; vanish. The Bessel indicial equation, Eq. (7.43) 
with n = 0, is 
sr? =0. 
Hence we verify Eqs. (7.72) to (7.74) with n and a set to zero. 
Our first solution is available from Eq. (7.49). It is’ 
Sahai = 06% (7.83) 
yi = J0 = x 64 : F 


Now, substituting all this into Eq. (7.67), we have the specific case corresponding to 
Eq. (7.75): 


exp [- Clg xp dx] 


2 
x2 x4 
p-#+8-~-| 





ST / ee (7.84) 


8For parity considerations, In x is taken to be In |x|, even. 
°The capital O (order of) as written here means terms proportional to x® and possibly higher powers of x. 
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From the numerator of the integrand, 


x2 
dx, 1 
exp | — a = exp[—Inx2] = ra 


This corresponds to the x, P-! in Eq. (7.77). From the denominator of the integrand, using 
a binomial expansion, we obtain 


2 472 2 4 
x x x 5x 
1-242 =1[4 2424... 
+ + 5 + 32 + 
Corresponding to Eq. (7.79), we have 


a= f J fits aces 
Y2\X) = JOlX x 2 32 xX2 





Pee nee (7.85) 
=> Xx nx = — ese a , 
: 4" 128 
Let us check this result. From Eq. (14.62), which gives the standard form of the second 
solution, which is called a Neumann function and designated Yo, 
KOs eee ge 
= —|Inx —In —}—— +s}. 
en gene YOO T Ta 128 
Two points arise: (1) Since Bessel’s equation is homogeneous, we may multiply y2(x) by 
any constant. To match Yo(x), we multiply our y2(x) by 2/z. (2) To our second solution, 
(2/1) y2(x), we may add any constant multiple of the first solution. Again, to match Yo(x) 
we add 


(7.86) 


2 
—[- In2+ y |Jo(x), 
cA 


where y is the Euler-Mascheroni constant, defined in Eq. (1.13).!° 


second solution is 


Our new, modified 





128 


Now the comparison with Yo(x) requires only a simple multiplication of the series for 
Jo(x) from Eq. (7.83) and the curly bracket of Eq. (7.87). The multiplication checks, 
through terms of order x* and x*+, which is all we carried. Our second solution from 
Eqs. (7.67) and (7.75) agrees with the standard second solution, the Neumann function 
Yo(x). | 


2 2 x2 5x4 
ya(a) = S[Ine ~In2-+ y]Jo0) + 200 | rhs +e}. (7.87) 


The analysis that indicated the second solution of Eq. (7.58) to have the form given in 
Eq. (7.82) suggests the possibility of just substituting Eq. (7.82) into the original differen- 
tial equation and determining the coefficients d;. However, the process has some features 
different from that of Section 7.5, and is illustrated by the following example. 


10The Neumann function Yo is defined as it is in order to achieve convenient asymptotic properties; see Sections 14.3 and 14.6. 
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Example 7. 6.5 More NEUMANN FUNCTIONS 


We consider here second solutions to Bessel’s ODE of integer orders n > 0, using the 
expansion given in Eq. (7.82). The first solution, designated J, and presented in Eq. (7.49), 
arises from the value a =n from the indicial equation, while the quantity called n in 
Eq. (7.82), the separation of the two roots of the indicial equation, has in the current context 
the value 2n. Thus, Eq. (7.82) takes the form 
Co 
y2(x) = Jn(x)In |x| + SY) djxt™, (7.88) 
j=-2n 

where y2 must, apart from scale and a possible multiple of J,, be the second solution 
Y, of the Bessel equation. Substituting this form into Bessel’s equation, carrying out the 
indicated differentiations and using the fact that J,,(x) is a solution of our ODE, we get 
after combining similar terms 


x yh + xy + (x? —n)y9 = 
2xFx)+ D> FG +2n)djxl" + Y~ djxit"™? =0. (7.89) 


jz—2n j>—2n 
We next insert the power-series expansion 
=) ae, (7.90) 
j20 


where the coefficients can be obtained by differentiation of the expansion of J, see 
Eq. (7.49), and have the values (for j > 0) 


(—D/ (n+ 2j) 





i Tia plareial’ 
azj4+1 = 0. (7.91) 
This, and a redefinition of the index / in the last term, bring Eq. (7.89) to the form 
Yoajxit + Yo FG t2n)djxi"+ Yo dj-axi*" =0. (7.92) 
j20 j2—2n j2—2n4+2 
Considering first the coefficient of x~"*! (corresponding to j = —2n + 1), we note that 


its vanishing requires that d_2,41 vanish, as the only contribution comes from the middle 
summation. Since all a; of odd j vanish, the vanishing of d_2,+1 implies that all other d; 
of odd j must also vanish. We therefore only need to give further consideration to even j. 

We next note that the coefficient do is arbitrary, and may without loss of generality 
be set to zero. This is true because we may bring do to any value by adding to y2 an 
appropriate multiple of the solution J;,, whose expansion has an x” leading term. We have 
then exhausted all freedom in specifying y2; its scale is determined by our choice of its 
logarithmic term. 

Now, taking the coefficient of x” (terms with j = 0), and remembering that dp = 0, we 
have 


d_2=—ao, 
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and we may recur downward in steps of 2, using formulas based on the coefficients of 


x"? x"—4 |. corresponding to 


dj-2=—jQn+ j)dj, j=—2,—4,...,—-2n+2. 
To obtain d; with positive j, we recur upward, obtaining from the coefficient of en 


—a; —dj_ 
Pe PIA ye 
j(Qn+ j) 
again remembering that do = 0. 
Proceeding to n = 1| as a specific example, we have from Eq. (7.91) ag = 1, a2 = —3/8, 
and a4 = 5/192, so 








i=l, a= -2 = 5. dy = ay 
thus 
i 22 2 = 
ya(x) = Ji (x) In |x| Sa” sag 


in agreement (except for a multiple of J; and a scale factor) with the standard form of the 
Neumann function Y;: 








vice) == |in|=|+y Clee ee | aoe 
x | 12 2 x| x 64 2304 


As shown in the examples, the second solution will usually diverge at the origin because 
of the logarithmic factor and the negative powers of x in the series. For this reason y2(x) is 
often referred to as the irregular solution. The first series solution, yj (x), which usually 
converges at the origin, is called the regular solution. The question of behavior at the 
origin is discussed in more detail in Chapters 14 and 15, in which we take up Bessel 
functions, modified Bessel functions, and Legendre functions. 


Summary 


The two solutions of both sections (together with the exercises) provide a complete solu- 
tion of our linear, homogeneous, second-order ODE, assuming that the point of expansion 
is no worse than a regular singularity. At least one solution can always be obtained by 
series substitution (Section 7.5). A second, linearly independent solution can be con- 
structed by the Wronskian double integral, Eq. (7.67). This is all there are: No third, 
linearly independent solution exists (compare Exercise 7.6.10). 

The inhomogeneous, linear, second-order ODE will have a general solution formed by 
adding a particular solution to the complete inhomogeneous equation to the general solu- 
tion of the corresponding homogeneous ODE. Techniques for finding particular solutions 
of linear but inhomogeneous ODEs are the topic of the next section. 
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Exercises 


7.6.1 


7.6.3 


7.6.5 


7.6.6 


7.6.9 


You know that the three unit vectors @,, é,, and é, are mutually perpendicular 
(orthogonal). Show that é,, éy, and é, are linearly independent. Specifically, show that 
no relation of the form of Eq. (7.54) exists for é,, é,, and é;. 


The criterion for the linear independence of three vectors A, B, and C is that the 
equation 

aA+bB+cC=0, 
analogous to Eq. (7.54), has no solution other than the trivial a = b = c = 0. Using 
components A = (A1, Az, A3), and so on, set up the determinant criterion for the exis- 


tence or nonexistence of a nontrivial solution for the coefficients a, b, and c. Show that 
your criterion is equivalent to the scalar triple product A- B x C40. 


Using the Wronskian determinant, show that the set of functions 
x” 
{1 —(n= 1,2,....89 
n! 


is linearly independent. 


If the Wronskian of two functions y; and y is identically zero, show by direct integra- 
tion that 


y1 =CY2, 
that is, that y, and yp are linearly dependent. Assume the functions have continuous 


derivatives and that at least one of the functions does not vanish in the interval under 
consideration. 


The Wronskian of two functions is found to be zero at x9 —€¢ < x < x9 + € for arbitrarily 
small ¢ > 0. Show that this Wronskian vanishes for all x and that the functions are 
linearly dependent. 


The three functions sinx, e*, and e~~ are linearly independent. No one function can be 
written as a linear combination of the other two. Show that the Wronskian of sin x, e*, 
and e~* vanishes but only at isolated points. 


ANS. W=4Asinx, 
W=Oforx=tnz, n=0,1,2,.... 





Consider two functions yg} = x and g2 = |x|. Since gy, = 1 and yg = x/|x|, W(g1, ¢2) = 
0 for any interval, including [—1,+1]. Does the vanishing of the Wronskian over 
[—1,+1] prove that g; and @ are linearly dependent? Clearly, they are not. What is 
wrong? 


Explain that linear independence does not mean the absence of any dependence. Illus- 
trate your argument with coshx and e*. 


Legendre’s differential equation 


(1—x?)y” —2xy’ +n(n+ ly =0 





7.6.10 


7.6.11 


7.6.12 


7.6.13 


7.6 Other Solutions 371 


has a regular solution P,,(x) and an irregular solution Q,(x). Show that the Wronskian 
of P, and Q, is given by 


An 
Pr(x)Q) (x) — Py (X) n(x) = To x2’ 


with A, independent of x. 


Show, by means of the Wronskian, that a linear, second-order, homogeneous ODE of 
the form 


y" (x) + P(x)y'(x) + O(x) y(x) =0 
cannot have three independent solutions. 
Hint. Assume a third solution and show that the Wronskian vanishes for all x. 


Show the following when the linear second-order differential equation py” +qy' +ry = 
0 is expressed in self-adjoint form: 


(a) The Wronskian is equal to a constant divided by p: 


Cc 
W(x) = —-. 
oa p(x) 


(b) A second solution y2(x) is obtained from a first solution y; (x) as 


(x) =Cyi(x) / ai 
2(x) = Cy (x —_—__—_—_.. 
u eS POOP 
Transform our linear, second-order ODE 

y" + P(x)y’+ O(x)y =0 


by the substitution 


x 
1 
y=zexp =; | Poar 


and show that the resulting differential equation for z is 
z+q(x)z=0, 


where 
tg 1, 
q(x) = Q(x) — = P’'(x~) — 7 P*(). 
2 4 
Note. This substitution can be derived by the technique of Exercise 7.6.25. 


Use the result of Exercise 7.6.12 to show that the replacement of g(r) by rg(r) may be 
expected to eliminate the first derivative from the Laplacian in spherical polar coordi- 
nates. See also Exercise 3.10.34. 





372 Chapter 7 Ordinary Differential Equations 


7.6.14 


7.6.15 


7.6.16 


7.6.17 


7.6.18 


By direct differentiation and substitution show that 


=f P@dt |. 


wieaaie) a OR 


satisfies, like yj (x), the ODE 
yy (x) + P(x) y5(x) + Q(x) y2(x) = 0. 


Note. The Leibniz formula for the derivative of an integral is 


h(a) h(a) 


=f fo.ear= | PEP ars pre, i - fig@, 2. 
da :- 








g(a) g(a) 


In the equation 


[—f* P@dt] ; 
Lyi (s)]? 


’ 


yo(x) = v1 of 2m 


y1 (x) satisfies 
yi t+ Px)y + Ox) = 


The function y2(x) is a linearly independent second solution of the same equation. 
Show that the inclusion of lower limits on the two integrals leads to nothing new, that 
is, that it generates only an overall constant factor and a constant multiple of the known 
solution yj (x). 


Given that one solution of 


vy Lo, m2 _ 
R°+-R —-—R=0 
r r2 


is R=r", show that Eq. (7.67) predicts a second solution, R=r~™. 
Using 
Cc 
= {= 1" 2n+1 
yi(x) 2 eT 


as a solution of the linear oscillator equation, follow the analysis that proceeds through 
Eq. (7.81) and show that in that equation cy, = 0, so that in this case the second solution 
does not contain a logarithmic term. 


Show that when n is not an integer in Bessel’s ODE, Eq. (7.40), the second solution of 
Bessel’s equation, obtained from Eq. (7.67), does not contain a logarithmic term. 





7.6.19 


7.6.20 


7.6.21 


7.6.22 


7.6.23 
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(a) One solution of Hermite’s differential equation 
y” —2xy' +2ay =0 


for a = 0 is yj (x) = 1. Find a second solution, y2(x), using Eq. (7.67). Show that 
your second solution is equivalent to youq (Exercise 8.3.3). 


(b) Find a second solution for a = 1, where yj(x) =x, using Eq. (7.67). Show that 
your second solution is equivalent to Yeyen (Exercise 8.3.3). 


One solution of Laguerre’s differential equation 
xy” +(1—x)y’ +ny =0 


for n = 0 is y,(x) = 1. Using Eq. (7.67), develop a second, linearly independent solu- 
tion. Exhibit the logarithmic term explicitly. 


For Laguerre’s equation with n = 0, 
_ S 
e 
no) = f Sas, 


(a) Write y2(x) as a logarithm plus a power series. 


(b) Verify that the integral form of y2(x), previously given, is a solution of Laguerre’s 
equation (n = 0) by direct differentiation of the integral and substitution into the 
differential equation. 


(c) Verify that the series form of y2(x), part (a), is a solution by differentiating the 
series and substituting back into Laguerre’s equation. 
One solution of the Chebyshev equation 
(l—x*)y” —xy’+n’?y =0 


forn =Ois y; = 1. 


(a) Using Eq. (7.67), develop a second, linearly independent solution. 
(b) Finda second solution by direct integration of the Chebyshev equation. 


Hint. Let v = y’ and integrate. Compare your result with the second solution given in 
Section 18.4. 


ANS. (a) y2=sin7!x. 
(b) The second solution, V,, (x), is not defined for n = 0. 


One solution of the Chebyshev equation 
(1—x7)y”— xy’ +n’y =0 


for n = 1 is yj(x) = x. Set up the Wronskian double integral solution and derive a 
second solution, y2(x). 


ANS. yp =—(1—x?)!/”. 
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7.6.24 The radial Schrédinger wave equation for a spherically symmetric potential can be writ- 
ten in the form 





a? a? ne? 
ia +10 + 55 + vio] y(r) = Ey(r). 


The potential energy V(r) may be expanded about the origin as 
b_\ 
Vory=— tbo tbirte. 


(a) Show that there is one (regular) solution y;(r) starting with r!*!. 


(b) From Eq. (7.69) show that the irregular solution y(r) diverges at the origin as r~. 


7.6.25 Show that if a second solution, y2, is assumed to be related to the first solution, yj, 
according to y2(x) = yj (x) f (x), substitution back into the original equation 


yy + P(x) ys + Q(x)y2 =0 


leads to 


x 


_ f expl- f° P@dt] 
ve =| BION: 


in agreement with Eq. (7.67). 
7.6.26 (a) Show that 


Praa oe a? 
a ee 
has two solutions: 
yi (x) = anx0™?, 
y2(x) = agx 9/2 


(b) For a = 0 the two linearly independent solutions of part (a) reduce to the single 
solution yy = agx'/?. Using Eq. (7.68) derive a second solution, 


ine. 


yar (x) = aox 
Verify that yz is indeed a solution. 


(c) Show that the second solution from part (b) may be obtained as a limiting case 
from the two solutions of part (a): 


yy (x) = lim (2). 
a0 a 
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7.7 INHOMOGENEOUS LINEAR ODES 


We frame the discussion in terms of second-order ODEs, although the methods can be 
extended to equations of higher order. We thus consider ODEs of the general form 


y+ P(x)y’ + O(x)y = F(x), (7.94) 


and proceed under the assumption that the corresponding homogeneous equation, with 
F(x) =0, has been solved, thereby obtaining two independent solutions designated y, (x) 
and y2(x). 


Variation of Parameters 


The method of variation of parameters (variation of the constant) starts by writing a par- 
ticular solution of the inhomogeneous ODE, Eq. (7.94), in the form 


y(x) = uy (x) yi (xX) + a(x) y2 (x). (7.95) 


We have specifically written u;(x) and u2(x) to emphasize that these are functions of the 
independent variable, and not constant coefficients. This, of course, means that Eq. (7.95) 
does not constitute a restriction to the functional form of y(x). For clarity and compactness, 
we will usually write these functions just as vu; and u2. 

In preparation for inserting y(x), from Eq. (7.95), into the inhomogeneous ODE, we 
compute its derivative: 


y =u1y, + 42y3 + (yu) + youd), 


and take advantage of the redundancy in the form assumed for y by choosing wu; and u2 in 
such a way that 


yiuy + yous =0, (7.96) 


where Eq. (7.96) is assumed to be an identity (i.e., to apply for all x). We will shortly show 
that requiring Eq. (7.96) does not lead to an inconsistency. 
After applying Eq. (7.96), y’, and its derivative y”, are found to be 


y =u1y, +42Y5, 
y" Suryy +uayy tuly| tury, 
and substitution into Eq. (7.94) yields 
(ui yy} Fuays tu yy +uyys) + P(x)iy, + u2ys) + O(x)iy1 + u2y2) = F(x), 
which, because y; and y2 are solutions of the homogeneous equation, reduces to 
uy, +uyys = F(a). (7.97) 


Equations (7.96) and (7.97) are, for each value of x, a set of two simultaneous algebraic 
equations in the variables uw, and uw’; to emphasize this point we repeat them here: 
yu + yous =0, 
i - (7.98) 
Yiu, + YoU, = F(x). 
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The determinant of the coefficients of these equations is 


yi 2 
yy I 








which we recognize as the Wronskian of the linearly independent solutions to the homo- 
geneous equation. That means this determinant is nonzero, so there will, for each x, be 
a unique solution to Eqs. (7.98), i.e., unique functions wu‘, and u,. We conclude that the 
restriction implied by Eq. (7.96) is permissible. 

Once u', and uw, have been identified, each can be integrated, respectively yielding uv, 
and w2, and, via Eq. (7.95), a particular solution of our inhomogeneous ODE. 


Example 7.7.1 ~~ AN INHOMOGENEOUS ODE 


Consider the ODE 
(l—x)y"+xy’-y=(1 =a (7.99) 


The corresponding homogeneous ODE has solutions yj = x and y2 = e*. Thus, y; = 1, 
y5 = e*, and the simultaneous equations for u, and uw’, are 


xu +e%us =0, 
; acy (7.100) 
uy + eu, = F(x). 


Here F(x) is the inhomogeneous term when the ODE has been written in the standard 
form, Eq. (7.94). This means that we must divide Eq. (7.99) through by | — x (the coeffi- 
cient of y”), after which we see that F(x) = 1— x. 

With the above choice of F(x), we solve Eqs. (7.100), obtaining 


#=1, w=—xe*, 
which integrate to 
uUj=x, uwy=(x+le*. 
Now forming a particular solution to the inhomogeneous ODE, we have 
yp(x) = uiyi + urye = x(x) + ((x+ De“) e* =x? 4x41. 


Because x is a solution to the homogeneous equation, we may remove it from the above 
expression, leaving the more compact formula yp» = x+1. 
The general solution to our ODE therefore takes the final form 


y(x) = Cyx + Coe* +27 +1. 
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Exercises 
7.71 If our linear, second-order ODE is inhomogeneous, that is, of the form of Eq. (7.94), 
the most general solution is 
yx) = yi (x) + yo) + yp(x), 
where y; and y2 are independent solutions of the homogeneous equation. 
Show that 
x 
yi(s) F(s)ds _y2(s)F(s)ds | 
Yp(x) = ya(x) yi (x) 
Wtyi(s), yo(s)} Wiy16s), y2(s)}? 
with W{y1 (x), y2(x)} the Wronskian of y;(s) and y2(s). 
Find the general solutions to the following inhomogeneous ODEs: 
7.7.2 y+y=1. 
7.7.3 y+ 4y =e. 
7.7.4 y” —3y' + 2y =sinx. 
7.75 xy” —(1+x)y +y=x". 


7.8 NONLINEAR DIFFERENTIAL EQUATIONS 


The main outlines of large parts of physical theory have been developed using mathe- 
matics in which the objects of concern possessed some sort of linearity property. As a 
result, linear algebra (matrix theory) and solution methods for linear differential equations 
were appropriate mathematical tools, and the development of these mathematical topics 
has progressed in the directions illustrated by most of this book. However, there is some 
physics that requires the use of nonlinear differential equations (NDEs). The hydrodynam- 
ics of viscous, compressible media is described by the Navier-Stokes equations, which are 
nonlinear. The nonlinearity evidences itself in phenomena such as turbulent flow, which 
cannot be described using linear equations. Nonlinear equations are also at the heart of 
the description of behavior known as chaotic, in which the evolution of a system is so 
sensitive to its initial conditions that it effectively becomes unpredictable. 

The mathematics of nonlinear ODEs is both more difficult and less developed than that 
of linear ODEs, and accordingly we provide here only an extremely brief survey. Much of 
the recent progress in this area has been in the development of computational methods for 
nonlinear problems; that is also outside the scope of this text. 

In this final section of the present chapter we discuss briefly some specific NDEs, the 
classical Bernoulli and Riccati equations. 
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Bernoulli and Riccati Equations 
Bernoulli equations are nonlinear, having the form 


yx) = p@)y@) +4 @)Ly@)!", (7.101) 


where p and q are real functions and n 4 0, | to exclude first-order linear ODEs. However, 
if we substitute 


u(x) =[y(x)]'™, 
then Eq. (7.101) becomes a first-order linear ODE, 
u’=(l—-n)y "y'=(1- n)| p(x)u(x) + q(x), (7.102) 


which we can solve (using an integrating factor) as described in Section 7.2. 
Riccati equations are quadratic in y(x): 


y= p(x)y? +qQ)y +r(a), (7.103) 


where we require p 4 0 to exclude linear ODEs and r 4 0 to exclude Bernoulli equations. 
There is no known general method for solving Riccati equations. However, when a special 
solution yo(x) of Eq. (7.103) is known by a guess or inspection, then one can write the 
general solution in the form y = yo + uv, with u satisfying the Bernoulli equation 


u’ = pu? + (2pyo+q)u, (7.104) 


because substitution of y = yo + u into Eq. (7.103) removes r(x) from the resulting 
equation. 

There are no general methods for obtaining exact solutions of most nonlinear ODEs. 
This fact makes it more important to develop methods for finding the qualitative behavior 
of solutions. In Section 7.5 of this chapter we mentioned that power-series solutions of 
ODEs exist except (possibly) at essential singularities of the ODE. The coefficients in 
the power-series expansions provide us with the asymptotic behavior of the solutions. By 
making expansions of solutions to NDEs and retaining only the linear terms, it will often 
be possible to understand the qualitative behavior of the solutions in the neighborhood of 
the expansion point. 


Fixed and Movable Singularities, Special Solutions 


A first step in analyzing the solutions of NDEs is to identify their singularity structures. 
Solutions of NDEs may have singular points that are independent of the initial or bound- 
ary conditions; these are called fixed singularities. But in addition they may have spon- 
taneous, or movable, singularities that vary with the initial or boundary conditions. This 
feature complicates the asymptotic analysis of NDEs. 
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Example 7. 8. 1 MOVEABLE SINGULARITY 


Compare the linear ODE 


yp 20, 
x—-1 

(which has an obvious regular singularity at x = 1), with the NDE y’ = y’. Both have the 
same solution with initial condition y(0) = 1, namely y(x) = 1/(1 — x). But for y(O) = 2, 
the linear ODE has solution y = 1 + 1/(1 — x), while the NDE now has solution y(x) = 
2/(1 — 2x). The singularity in the solution of the NDE has moved to x = 1/2. a 


For a linear second-order ODE we have a complete description of its solutions and their 
asymptotic behavior when two linearly independent solutions are known. But for NDEs 
there may still be special solutions whose asymptotic behavior is not obtainable from two 
independent solutions. This is another characteristic property of NDEs, which we illustrate 
again by an example. 


Example 7.8.2 SPECIAL SOLUTION 


The NDE y” = yy’/x has two linearly independent solutions that define the two-parameter 
family of curves 


y(x) = 2c, tan(c; nx +c2) —1, (7.105) 


where the c; are integration constants. However, this NDE also has the special solution y = 
c3 = constant, which cannot be obtained from Eq. (7.105) by any choice of the parameters 
C1, C2. 

The “general solution” in Eq. (7.105) can be obtained by making the substitution x = e’, 
and then defining Y(t) = y(e’) so that x(dy/dx) = dY/dt, thereby obtaining the ODE 
Y” = Y’'(Y + 1). This ODE can be integrated once to give Y’ = 5Y¥ +Y+c withe= 
2(ct + 1/4) an integration constant. The equation for Y’ is separable and can be integrated 


again to yield Eq. (7.105). | 
Exercises 
7.8.1 Consider the Riccati equation y’ = y? — y — 2. A particular solution to this equation is 
y = 2. Find a more general solution. 
7.8.2 A particular solution to y’ = y?/x? — y/x +2x is y =x?. Find a more general solution. 
7.8.3 Solve the Bernoulli equation y’ + xy = xy?. 
7.8.4 ODEs of the form y = xy’ + f(y’) are known as Clairaut equations. The first step in 


solving an equation of this type is to differentiate it, yielding 
ysy tay" t+ fy", or y"(x+ f'O")) =0. 


Solutions may therefore be obtained both from y” = 0 and from f’(y’) = —x. The 
so-called general solution comes from y” = 0. 
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For f(y’) = 0)’, 


(a) Obtain the general solution (note that it contains a single constant). 
(b) Obtain the so-called singular solution from f’(y’) = —x. By substituting back into 
the original ODE show that this singular solution contains no adjustable constants. 


Note. The singular solution is the envelope of the general solutions. 
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CHAPTER 8 


STURM-LIOUVILLE THEORY 


8.1 INTRODUCTION 


Chapter 7 examined methods for solving ordinary differential equations (ODEs), with 
emphasis on techniques that can generate the solutions. In the present chapter we shift 
the focus to the general properties that solutions must have to be appropriate for specific 
physics problems, and to discuss the solutions using the notions of vector spaces and eigen- 
value problems that were developed in Chapters 5 and 6. 

A typical physics problem controlled by an ODE has two important properties: (1) Its 
solution must satisfy boundary conditions, and (2) It contains a parameter whose value 
must be set in a way that satisfies the boundary conditions. From a vector-space perspec- 
tive, the boundary conditions (plus continuity and differentiability requirements) define the 
Hilbert space of our problem, while the parameter normally occurs in a way that permits 
the ODE to be written as an eigenvalue equation within that Hilbert space. 

These ideas can be made clearer by examining a specific example. The standing waves 
of a vibrating string clamped at its ends are governed by the ODE 


dy 

dx? + 
where w(x) is the amplitude of the transverse displacement at the point x along the string, 
and k is a parameter. This ODE has solutions for any value of k, but the solutions of 
relevance to the string problem must have w(x) = 0 for the values of x at the ends of the 
string. 

The boundary conditions of this problem can be interpreted as defining a Hilbert space 

whose members are differentiable functions with zeros at the boundary values of x; the 
ODE itself can be written as the eigenvalue equation 


kw =0, (8.1) 


a 


— 72 aha 
Ly=hy, L=-T5. 


(8.2) 


381 


Mathematical Methods for Physicists. 
© 2013 Elsevier Inc. All rights reserved. 





382 


Chapter 8 Sturm-Liouville Theory 


For practical reasons the eigenvalue is given the name k?. It is required to find functions 
w(x) that solve Eq. (8.2) subject to the boundary conditions, i.e., to find members w(x) of 
our Hilbert space that solve the eigenvalue equation. 

We could now follow the procedures developed in Chapter 5, namely (1) choose a basis 
for our Hilbert space (a set of functions with zeros at the boundary values of x), (2) define 
a scalar product for our space, (3) expand £ and w in terms of our basis, and (4) solve the 
resulting matrix equation. However, that procedure makes no use of any specific features 
of the current ODE, and in particular ignores the fact that it is easily solved. 

Instead, we continue with the example defined by Eq. (8.1), using our ability to solve 
the ODE involved. 


Example 8.1.1 STANDING WAVES, VIBRATING STRING 


We consider a string clamped at x = 0 and x =/ and undergoing transverse vibrations. 
As already indicated, its standing wave amplitudes w(x) are solutions of the differential 
equation 


d*p(x) 
dx2 


where k is not initially known and w(x) is subject to the boundary conditions that the ends 
of the string be fixed in position: y(0) = w(J) = 0. This is the eigenvalue problem defined 
in Eq. (8.2). 

The general solution to this differential equation is (x) = Asinkx + Bcoskx, and in 
the absence of the boundary conditions solutions would exist for all values of k, A, and 
B. However, the boundary condition at x = 0 requires us to set B = 0, leaving w(x) = 
Asinkx. We have yet to satisfy the boundary condition at x =/. The fact that A is as yet 
unspecified is not helpful for this purpose, as A = 0 leaves us with only the trivial solution 
w = 0. We must, instead, require sink! = 0, which is accomplished by setting kl = nz, 
where n is a nonzero integer, leading to 





+k w(x) =0, (8.3) 


n> 7? 


Un(x) = Asin (=~) Pa? Hate (8.4) 
Toe mt), BaF, n=1,2.... 


Because Eq. (8.3) is homogeneous, it will have solutions of arbitrary scale, so A can have 
any value. Since our purpose is usually to identify linearly independent solutions, we disre- 
gard changes in the sign or magnitude of A. In the vibrating string problem, these quantities 
control the amplitude and phase of the standing waves. Since changing the sign of n sim- 
ply changes the sign of w, +n and —n in Eq. (8.4) are regarded here as equivalent, so we 
restricted n to positive values. The first few W, are shown in Fig. 8.1. Note that the number 
of nodes increases with n: w, has n + 1 nodes (including the two nodes at the ends of the 
string). 

The fact that our problem has solutions only for discrete values of k is typical of eigen- 
value problems, and in this problem the discreteness in k can be traced directly to the 
presence of the boundary conditions. Figure 8.2 shows what happens when k is varied 
in either direction from the acceptable value 2//, with the boundary condition at x = 0 
maintained for all k. It is obvious that the eigenvalues (here k*) lie at separated points, and 
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FiGURE 8.1 Standing wave patterns of a vibrating string. 














FIGURE 8.2 Solutions to Eq. (8.3) on the range 0 < x </ for: (a) k =0.97/1I, 
(b)k=a/I, (c)k = 1.2/1, (d) k = 1.527/1, (0) k= 1.921/1. 


that the boundary condition at x =/ cannot be satisfied for k < 2/1. Moreover, the first 
acceptable k value larger than z:// is clearly larger than 1.97/J (it is actually 277//). 

As already noted, the solution to this eigenvalue problem is undetermined as to scale 
because the underlying equation (together with its boundary conditions) is homogeneous. 
However, if we introduce a scalar product of definition 


1 
ee / F*@e()dx, (8.5) 
0 


we can define solutions that are normalized; requiring (Wy |W») = 1, we have, with arbitrary 


sign, 
2. (“nmx 
vata) =f 2 sin (2 i; (8.6) 


Although we did not solve Eq. (8.2) by an expansion technique, the solutions (the eigen- 
functions) will still have properties that depend on whether the operator £ is Hermitian. As 
we saw in Chapter 5, the Hermitian property depends both on £ and the definition of the 
scalar product, and a topic for discussion in the present chapter is the identification of con- 
ditions making an operator Hermitian. This issue is important because Hermiticity implies 
real eigenvalues as well as orthogonality and completeness of the eigenfunctions. | 
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Summarizing, the matters of interest here, and the subject matter of the current chapter, 
include: 


1. The conditions under which an ODE can be written as an eigenvalue equation with a 
self-adjoint (Hermitian) operator, 

2. Methods for the solution of ODEs subject to boundary conditions, and 

3. The properties of the solutions to ODE eigenvalue equations. 


8.2 HERMITIAN OPERATORS 


Characterization of the general features of eigenproblems arising from second-order dif- 
ferential equations is known as Sturm-Liouville theory. It therefore deals with eigenvalue 
problems of the form 


Ly (x) =AW(x), (8.7) 
where L is a linear second-order differential operator, of the general form 
d? d 
L(x) = pox) 5 + pi@)—— + po(e). (8.8) 
dx dx 
The key matter at issue here is to identify the conditions under which £ is a Hermitian 
operator. 
Self-Adjoint ODEs 


L£ is known in differential equation theory as self-adjoint if 
Po (x) = pil). (8.9) 


This feature enables L(x) to be written 
d d 
L(x) = — | po(x)—— | + po), (8.10) 
dx dx 
and the operation of £ on a function u(x) then takes the form 


Lu = (pou’y + pu. (8.11) 


Inserting Eq. (8.11) into an integral of the form Ve v*(x)Lu(x)dx, we proceed by applying 
an integration by parts to the po term (assuming that po is real): 

b b 

/ v*(x)Lu(x) dx = / [»* (pou')’ + v*paut| dx 


a a 


b 
b 
= [ »* pow’ | + f [0 pow! + v* pa] dx. 
a 


a 
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Another integration by parts leads to 


b b 
b , 
/ v*(x)Lu(x) dx = [ &* pow’ = (oY pou| ; + / [pow] u+ v* pau| dx 
; b 
= [ &* pow’ = (oY pou| + [cor dx. (8.12) 


Equation (8.12) shows that, if the boundary terms | --- iL vanish and the scalar product 
is an unweighted integral from a to b, then the operator C is self-adjoint, as that term 
was defined for operators. In passing, we observe that the notion of self-adjointness in 
differential equation theory is weaker than the corresponding concept for operators in our 
Hilbert spaces, due to the lack of a requirement on the boundary terms. We again stress 
that the Hilbert-space definition of self-adjoint depends not only on the form of £ but also 
on the definition of the scalar product and the boundary conditions. 

Looking further at the boundary terms, we see that they are surely zero if u and v both 
vanish at the endpoints x = a and x = b (a case of what are termed Dirichlet boundary 
conditions). The boundary terms are also zero if both u’ and v’ vanish at a and b (Neu- 
mann boundary conditions). Even if neither Dirichlet nor Neumann boundary conditions 
apply, it may happen (particularly in a periodic system, such as a crystal lattice) that the 
boundary terms vanish because v* pow | a= v* pow | b for all u and v. 

Specializing Eq. (8.12) to the case that u and v are eigenfunctions of £ with respective 
real eigenvalues A, and A,, that equation reduces to 

; b 
ce / vtudx = [ potu*w! = (*y'u)| (8.13) 
- a 
It is thus apparent that if the boundary terms vanish and i, 4 A,, then u and v must 
be orthogonal on the interval (a,b). This is a specific illustration of the orthogonality 
requirement for eigenfunctions of a Hermitian operator in a Hilbert space. 


Making an ODE Self-Adjoint 


Some of the differential equations that are important in physics involve operators £ that are 
self-adjoint in the differential-equation sense, meaning that they satisfy Eq. (8.9); others 
are not. However, if an operator does not satisfy Eq. (8.9), it is known how to multiply it 
by a quantity that converts it into self-adjoint form. Letting such a quantity be designated 
w(x), the Sturm-Liouville eigenvalue problem of Eq. (8.7) becomes 


waX)L(x)y(x) = w(x)AWr), (8.14) 


an equation that has the same eigenvalues 4 and eigenfunctions (x) as the original prob- 
lem in Eq. (8.7). If now w(x) is chosen to be 


w(x) = pol exp (/ Pitz) ax), (8.15) 
Po(x) 
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where pg and p, are the quantities in £ as given in Eq. (8.8), we can by direct evaluation 
find that 


d? d 
wXL(x) = Por a + Piz, + w(x) p2(x), (8.16) 


Po=exp( f nm ax), P= exp ( | re ax), (8.17) 
Po(x) Po Po(x) 


It is then straightforward to show that p, =P, so wL satisfies the self-adjoint condition. 
If we now apply the process represented by Eq. (8.12) to wl, we get 


where 








b b 
, b 
Je coweneucey as = [ o* Bow’ — (v*) Bou | + | w(x) (Lu)*udx. (8.18) 
If the boundary terms vanish, Eq. (8.18) is equivalent to (v|£|u) = (£v|u) when the scalar 
product is defined to be 
b 
(vju) = f v*(x)u(x)w(x) dx. (8.19) 


a 


Again considering the case that u and v are eigenfunctions of £, with respective eigen- 
values 2, and A,, Eq. (8.18) reduces to 


b 
c= i») | v'uwdx =| wpo (v*u! — w*y'u) | ° (8.20) 


where po is the coefficient of y” in the original ODE. We thus see that if the right-hand 
side of Eq. (8.20) vanishes, then u and v are orthogonal on (a, b) with weight factor w 
when A, # Ay. In other words, our choice of scalar product definition and boundary con- 
ditions have made C£ a self-adjoint operator in our Hilbert space, thereby producing an 
eigenfunction orthogonality condition. 

Summarizing, we have the useful and important result: 


If a second-order differential operator L has coefficients po(x) and p(x) that sat- 
isfy the self-adjoint condition, Eq. (8.9), then it is Hermitian, given (a) a scalar prod- 
uct of uniform weight and (b) boundary conditions that remove the endpoint terms of 
Eq. (8.12). 

If Eq. (8.9) is not satisfied, then L is Hermitian if (a) the scalar product is defined 
to include the weight factor given in Eq. (8.15), and (b) boundary conditions cause 
removal of the endpoint terms in Eq. (8.18). 


Note that once the problem has been defined such that £ is Hermitian, then the general 
properties proved for Hermitian problems apply: the eigenvalues are real; the eigenfunc- 
tions are (or if degenerate can be made) orthogonal, using the relevant scalar product 
definition. 
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Example 8.2.1 LAGUERRE FUNCTIONS 


Consider the eigenvalue problem LW = Av, with 


d? d 
L=x—+U0-x)—, 8.21 
xa td - De (8.21) 
subject to (a) y% nonsingular on 0 < x < oo, and (b) limy.o0 w(x) = 0. Condition (a) is 
simply a requirement that we use the solution of the differential equation that is regular at 
x = 0; and condition (b) is a typical Dirichlet boundary condition. 
The operator C is not self-adjoint, with pp = x and p; = 1 — x. But we can form 





1 1- 1 
w(x) = — exp (/ “as: = —¢* =e *, (8.22) 
xX X Xx 


The boundary terms, for arbitrary eigenfunctions u and v, are of the form 


[sea(vw'— ov) ]® 


their contributions at x = co vanish because u and v go to zero; the common factor x 
causes the x = 0 contribution to vanish also. We therefore have a self-adjoint problem, 
with u and v of different eigenvalues orthogonal under the definition 


(ola) = f v*oouGneae. 
0 


The eigenvalue equation of this example is that whose solutions are the Laguerre 
polynomials; what we have shown here is that they are orthogonal on (0,00) with 


weight e~*. a 
Exercises 

8.2.1 Show that Laguerre’s ODE, Table 7.1, may be put into self-adjoint form by multiplying 
by e~* and that w(x) = e~* is the weighting function. 

8.2.2 Show that the Hermite ODE, Table 7.1, may be put into self-adjoint form by multiplying 
by e-*” and that this gives w(x) =e~* as the appropriate weighting function. 

8.2.3 Show that the Chebyshev ODE, Table 7.1, may be put into self-adjoint form by mul- 
tiplying by (1 — x?)~!/? and that this gives w(x) = (1 — x7)~!/? as the appropriate 
weighting function. 

8.2.4 The Legendre, Chebyshev, Hermite, and Laguerre equations, given in Table 7.1, have 


solutions that are polynomials. Show that ranges of integration that guarantee that the 
Hermitian operator boundary conditions will be satisfied are 


(a) Legendre [—1, 1], (b) Chebyshev [—1, 1], 
(c) Hermite (—oo, oo), (d) Laguerre [0, 00). 
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8.2.5 


8.2.6 


8.2.7 


8.2.8 


8.2.9 


8.2.10 


The functions u;(x) and u2(x) are eigenfunctions of the same Hermitian operator but 
for distinct eigenvalues A; and A2. Prove that uj (x) and u2(x) are linearly independent. 


Given that 





P\(x)=x and Qo(x) = sin(; **) 


are solutions of Legendre’s differential equation (Table 7.1) corresponding to different 


eigenvalues: 


(a) Evaluate their orthogonality integral 
1 


fi» an dx. 
2; 1-x 


-l 





(b) Explain why these two functions are not orthogonal, that is, why the proof of 
orthogonality does not apply. 


To(x) = 1 and Vi (x) = (1 — x”)!/? are solutions of the Chebyshev differential equation 
corresponding to different eigenvalues. Explain, in terms of the boundary conditions, 
why these two functions are not orthogonal on the range (—1, 1) with the weighting 
function found in Exercise 8.2.3. 


A set of functions uy (x) satisfies the Sturm-Liouville equation 
d d 
an P(X) —Un(X) | + Anw(X)un(x) = 0. 
x dx 
The functions uj, (x) and uy, (x) satisfy boundary conditions that lead to orthogonality. 


The corresponding eigenvalues 4,, and A, are distinct. Prove that for appropriate bound- 
ary conditions, uj, (x) and u/ (x) are orthogonal with p(x) as a weighting function. 


Linear operator A has n distinct eigenvalues and n corresponding eigenfunctions: 
Aw = +i. Show that the n eigenfunctions are linearly independent. Do not assume 
A to be Hermitian. 
Hint. Assume linear dependence, i.e., that Wj, = eo a; Wj. Use this relation and the 
operator-eigenfunction equation first in one order and then in the reverse order. Show 
that a contradiction results. 
The ultraspherical polynomials C\” 
qt d (a) 

(1 — x“) —~ — (Qa + 1)x — + n(n 4+ 2a) ¢ CO (x) = 0. 

dx? dx e 


(x) are solutions of the differential equation 


(a) Transform this differential equation into self-adjoint form. 
(b) Find an interval of integration and weighting factor that make C. (at) (x) of the same 
a but different n orthogonal. 


Note. Assume that your solutions are polynomials. 
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8.3. ODE EIGENVALUE PROBLEMS 


Now that we have identified the conditions that make a second-order ODE eigenvalue 
problem Hermitian, let’s examine several such problems to gain further understanding of 
the processes involved and to illustrate techniques for finding solutions. 


Example 8.3.1 LEGENDRE EQUATION 


The Legendre equation, 
Ly(x) =—(1 — x*)y"(x) + 2xy'(x) = Ay(a), (8.23) 


defines an eigenvalue problem that arises when V? is written in spherical polar coordinates, 
with x identified as cos@, where @ is the polar angle of the coordinate system. The range 
of x in this context is —1 <x < 1, and in typical circumstances one needs solutions to 
Eq. (8.23) that are nonsingular on the entire range of x. It turns out that this is a nontrivial 
requirement, mainly because x = +1 are singular points of the Legendre ODE. If we regard 
nonsingularity of y at x = +1 as a set of boundary conditions, we shall find that this 
requirement is sufficient to define eigenfunctions of the Legendre operator. 

This eigenvalue problem, namely Eq. (8.23) plus nonsingularity at x = +1, is conve- 
niently handled by the method of Frobenius. We assume solutions of the form 











(oe) 
y= PP ae (8.24) 
j=0 


with indicial equation s(s — 1) = 0, whose solutions are s = 0 and s = 1. For s =0, we 
obtain the following recurrence relation for the coefficients a ;: 


jG+)-2 
G+DG+2) 7 
We may set a; = 0, thereby causing all a; of odd j to vanish, so (for s = 0) our series will 
contain only even powers of x. The boundary condition comes into play because Eq. (8.24) 
diverges at x = +1 for all A except those that actually cause the series to terminate after a 
finite number of terms. 


To see how the divergence arises, note that for large j and |x| = | the ratio of successive 
terms of the series approaches 


aju2= (8.25) 





ajxi G+) 
ajpax*? (G+ DG +2) 


so the ratio test is indeterminate. However, application of the Gauss test shows that this 
series diverges, as was discussed in more detail in Example 1.1.7. 

The series in Eq. (8.24) can be made to terminate after a; for some even / by choosing 
A=I1(1 + 1), a value that makes aj+2 = 0. Then aj+4, aj46,... will also vanish, and our 
solution will be a polynomial, which is clearly nonsingular for all |x| < 1. Summarizing, 
we have, for even /, solutions that are polynomials of degree / as eigenfunctions, and the 
corresponding eigenvalues are /(/ + 1). 





’ 
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For s = | we must set aj = 0 and the recurrence relation is 
G+)G +2) —-4 
aj4+2> ; ; 
G+2)0G +3) 

which also leads to divergence at |x| = 1. However, the divergence can now be avoided 

by setting A = (J+ 1)(. + 2) for some even value of /, thereby causing aj+2, aj44,... 

to vanish. The result will be a polynomial of degree / + s, i.e., of an odd degree / + 1. 

These solutions can be described equivalently as, for odd /, polynomials of degree J with 

eigenvalues 4 = /(/ + 1), so the overall set of eigenfunctions consists of polynomials of all 

integer degrees /, with respective eigenvalues /(/ + 1). When given the conventional scal- 

ing, these polynomials are called Legendre polynomials. Verification of these properties 
of solutions to the Legendre equation is left to Exercise 8.3.1. 

Before leaving the Legendre equation, note that its ODE is self-adjoint, and that the 
coefficient of d”/dx? in the Legendre operator is pp = —(1 — x”), which vanishes at x = 
+1. Comparing with Eq. (8.12), we see that this value of po causes the vanishing of the 
boundary terms when we take the adjoint of £, so the Legendre operator on the range 
—1<.x <1 is Hermitian, and therefore has orthogonal eigenfunctions. In other words, the 
Legendre polynomials are orthogonal with unit weight on (—1, 1). a 





aj, (8.26) 





Let’s examine one more ODE that leads to an interesting eigenvalue problem. 


Example 8.3.2 HERMITE EQUATION 


Consider the Hermite differential equation, 
Ly =—y"+2xy'=dy, (8.27) 


which we wish to regard as an eigenvalue problem on the range —oo < x < oo. To make 
£ Hermitian, we define a scalar product with a weight factor as given by Eq. (8.15), 


(fle) = / Fr(xe(xye™ dx, (8.28) 


and demand (as a boundary condition) that our eigenfunctions y, have finite norms using 
this scalar product, meaning that (y,|y,) < oo. 

Again we obtain a solution by the method of Frobenius, as a series of the form given 
in Eq. (8.24). Again the indicial equation is s(s — 1) = 0, and for s = 0 we can develop a 
series of even powers of x with coefficients satisfying the recurrence relation 

Dd 

G+ DG +2) 

This series converges for all x, but (assuming it does not terminate) it behaves asymptoti- 


aj+2 (8.29) 


cally for large |x| as e* and therefore does not describe a function of finite norm, even with 
2a552) ‘ ‘ és : 

the e~* weight factor in the scalar product. Thus, even though the series solution always 

converges, our boundary conditions require that we arrange to terminate the series, thereby 

producing polynomial solutions. From Eq. (8.29) we see that the condition for obtaining 

an even polynomial of degree j is that ) = 27. Odd polynomial solutions can be obtained 
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using the indicial equation solution s = 1. Details of both the solutions and the asymptotic 
properties are the subject of Exercise 8.3.3. 

Since we have established that this is a Hermitian eigenvalue problem with the scalar 
product as defined in Eq. (8.28), its solutions (when scaled conventionally they are called 
Hermite polynomials) are orthogonal using that scalar product. a 


Some ODE eigenvalue problems can be attacked by dividing the space in which they 
reside into regions that are most naturally treated in different ways. The following example 
illustrates this situation, with a potential that is assumed nonzero only within a finite region. 


Example 8.3.3 DEUTERON GROUND STATE 


The deuteron is a bound state of a neutron and a proton. Due to the short range of the 
nuclear force, the deuteron properties do not depend much on the detailed shape of the 
interaction potential. Thus, this system may be modeled by a spherically symmetric square 
well potential with the value V = Vo < 0 when the nucleons are within a distance a of each 
other, but with V = 0 when the internucleon distance is greater than a. The Schrédinger 
equation for the relative motion of the two nucleons assumes the form 


he 
—5—V'w+ Vi = EW, 
2u 


where ju is the reduced mass of the system (approximately half the mass of either particle). 
This eigenvalue equation must be solved subject to the boundary conditions that y be finite 
at r = 0 and approach zero at r = oo sufficiently rapidly to be a member of an L? Hilbert 
space. The eigenfunctions w must also be continuous and differentiable for all r, including 
r=a. 

It can be shown that if there is to be a bound state, FE will have to have a negative value 
in the range Vo < E <0, and the lowest state (the ground state) will be described by a 
wave function y that is spherically symmetric (thereby having no angular momentum). 
Thus, taking y = y(r) and using a result from Exercise 3.10.34 to write 


1 d?u 








Vw=-—, with u(r) =rv(r), 
r dr? 
the Schrédinger equation reduces to an ODE that assumes the form, for r <a, 
uy . 2m 
ee +kiuj=0, with = Gr E— Wo) > 0, 
while, for r > a, 
dur 5) . QUE 
qr 7 k2u2=0, with kj =-—— > 0. 


The solutions for these two ranges of r must connect smoothly, meaning that both u 
and du/dr must be continuous across r = a, and therefore must satisfy the matching 
conditions u (a) = u2(a), u' (a) = u5(a). In addition, the requirement that y be finite 
at r = 0 dictates that u;(0) = 0, and the boundary condition at r = oo requires that 
lim; o9 u2(r) = 0. 
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For r <a, our Schrédinger equation has the general solution 
u,(r) = Asinkir + Ccoskyr, 


and the boundary condition at r = 0 is only met if we set C = 0. The Schrédinger equation 
for r > a has the general solution 


u(r) = C’ exp(kzr) + Bexp(—kor), (8.30) 


and the boundary condition at r = oo requires us to set C’ = 0. The matching conditions 
at r =a then take the form 


Asinkja = Bexp(—kza) and Ak, coskja = —k2B exp(—k2a). 


Using the second of these equations to eliminate B exp(—k2a) from the first, we reach 
. ky 
Asin =A cosk,a, (8.31) 
2 


showing that the overall scale of the solution (i.e., A) is arbitrary, which is of course a 
consequence of the fact that the Schrédinger equation is homogeneous. 

Rearranging Eq. (8.31), and inserting values for ky, and kz, our matching conditions 
become 
2 


E-WVo 


1/2 
Aue og vi ao (8.32) 


hi 








ky 
tankja=——, or tan 
ko -—E 
This is an admittedly unpleasant implicit equation for £; if it has solutions with E in the 
range Vo < E <0, our model predicts deuteron bound state(s). 

One way to search for solutions to Eq. (8.32) is to plot its left- and right-hand sides 
as a function of E, identifying the E values, if any, for which they are equal. Taking 
Vo = —4.046 x 107!? J, a =2.5 fermi,! yp = 0.835 x 107-7’ kg, and f = 1.05 x 10734 J-s 
(joule-seconds), the two sides of Eq. (8.32) are plotted in Fig. 8.3 for the range of E in 
which a bound state is possible. The E values have been plotted in MeV (mega electron 
volts), the energy unit most frequently used in nuclear physics (1 MeV ~¥ 1.6 x 107}? J). 
The curves cross at only one point, indicating that the model predicts just one bound state. 
Its energy is at approximately E = —2.2 MeV. 

It is instructive to see what happens if we take E values that may or may not solve 
Eq. (8.32), using u(r) = Asink,r for r <a (thereby satisfying the r = 0 boundary condi- 
tion) but for r > a using the general form of u(r) as given in Eq. (8.30), with the coefficient 
values B and C’ that are required by the matching conditions for the chosen E value. Let- 
ting E_ and E, respectively, denote values of E less than and greater than the eigenvalue 
E, we find that by forcing a smooth connection at r = a we lose the required asymptotic 
behavior except at the eigenvalue. See Fig. 8.4. | 


1) fermi= 107!5 m. 
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FiGurRE 8.3 Left- and right-hand sides of Eq. (8.32) as a function of E' for the model 
parameters given in the text. 





r (fermi) 





FiGuURE 8.4 Wavefunctions for the deuteron problem when the energy is chosen to be 
less than the eigenvalue E (E_ < E) or greater than FE (E+ > E). 


Exercises 


8.3.1 Solve the Legendre equation 
(1 —x?)y” —2xy’ +n(n+ Dy =0 


by direct series substitution. 


(a) Verify that the indicial equation is 
s(s—1)=0. 
(b) Using s = 0 and setting the coefficient aj = 0, obtain a series of even powers of x: 


mot I) 2 “ (n — nn l(n+ 3) 4 Boxe |, 





Yeven = 40 1 
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(d) 


(e) 


where 
jG+)-n@th 
G+DG+ 7 


Using s = 1 and noting that the coefficient a, must be zero, develop a series of 
odd powers of x: 





aj42= 


(n—1)(n+2) 3 
~ 3! - 


sf eae see et] 


Yodd =40 E 





5! 
where 
G+DG+2-n@+) 
(+2) +3) 


Show that both solutions, yeyen and Yoaa, diverge for x = +1 if the series continue 
to infinity. (Compare with Exercise 1.2.5.) 





aj+2= 





Finally, show that by an appropriate choice of n, one series at a time may be con- 
verted into a polynomial, thereby avoiding the divergence catastrophe. In quantum 
mechanics this restriction of n to integral values corresponds to quantization of 
angular momentum. 


8.3.2 Show that with the weight factor exp(—x7) and the interval —oo < x < oo for the scalar 
product, the Hermite ODE eigenvalue problem is Hermitian. 


8.3.3 (a) 


(b) 


Develop series solutions for Hermite’s differential equation 
y” —2xy! + 2ay = 0. 
ANS. s(s — 1) =0, indicial equation. 








For s =0, 
dy — SE _ ene 
"(gb YC 42) 
2(—a)x2 —-2?(—a)(2 — a) x* 
Yen = ao 1+ Ty + A Heed. 
For s=1, 
{tio 
aj42 = 2a; ———————_ (J even), 
Pe TGERDGS 
2(1—a)x?  27(1 —@)(3 — a)x°? 
Yodd =ai|x4 31 + 2 ese]: 


Show that both series solutions are convergent for all x, the ratio of successive 
coefficients behaving, for a large index, like the corresponding ratio in the expan- 
sion of exp(x7). 
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(c) Show that by appropriate choice of a, the series solutions may be cut off and 
converted to finite polynomials. (These polynomials, properly normalized, become 
the Hermite polynomials in Section 18.1.) 


8.3.4 Laguerre’s ODE is 
MES +d - bay Be (x) +nLy(x) = 0. 
Develop a series solution and select the parameter n to make your series a polynomial. 
8.3.5 Solve the Chebyshev equation 
(1—x*)T! —xTi +n?T, =0, 


by series substitution. What restrictions are imposed on n if you demand that the series 
solution converge for x = +1? 








ANS. The infinite series does converge for x = +1 and no 
restriction on n exists (compare with Exercise 1.2.6). 


8.3.6 Solve 
d- UL) - 3xU (x) +n(n+2)U,(x) =0, 


choosing the root of the indicial equation to obtain a series of odd powers of x. Since 
the series will diverge for x = 1, choose n to convert it into a polynomial. 


8.4 VARIATION METHOD 


We saw in Chapter 6 that the expectation value of a Hermitian operator H for the normal- 
ized function w can be written as 


(1) = (Wl Aly), 


and that the expansion of this quantity in a basis consisting of the orthonormal eigenfunc- 
tions of H had the form given in Eq. (6.30): 


(H) =) aul? Ay, 
0g 


where a, is the coefficient of the th eigenfunction of H and A; is the corresponding 
eigenvalue. As we noted when we obtained this result, one of its consequences is that (H) 
is a weighted average of the eigenvalues of H, and therefore is at least as large as the small- 
est eigenvalue, and equal to the smallest eigenvalue only if w is actually an eigenfunction 
to which that eigenvalue corresponds. 

The observations of the foregoing paragraph hold true even if we do not actually make 
an expansion of w and even if we do not actually know or have available the eigenfunctions 
or eigenvalues of H. The knowledge that (H) is an upper limit to the smallest eigenvalue 
of H is sufficient to enable us to devise a method for approximating that eigenvalue and 
the associated eigenfunction. This eigenfunction will be the member of the Hilbert space 
of our problem that yields the smallest expectation value of H,, and a strategy for finding 
it is to search for the minimum in (H) within our Hilbert space. This is the essential idea 
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behind what is known as the variation method for the approximate solution of eigenvalue 
problems. 

Since in many problems (including most that arise in quantum mechanics) it is imprac- 
tical to compute (H) for all members of a Hilbert space, the actual approach is to define a 
portion of the Hilbert space by introducing an assumed functional form for y that contains 
parameters, and then to minimize (H) with respect to the parameters; this is the source 
of the name “variation method.” The success of the method will depend on whether the 
functional form that is chosen is capable of representing functions that are “close” to the 
desired eigenfunction (meaning that its coefficient in the expansion is relatively large, with 
other coefficients much smaller). The great advantage of the variation method is that we 
do not need to know anything about the exact eigenfunction and we do not actually have 
to make an expansion; we simply choose a suitable functional form and minimize (H). 

Since eigenvalue equations for energies and related quantities in quantum mechanics 
usually have finite smallest eigenvalues (e.g., ground energy levels), the variation method 
is frequently applicable. We point out that it is not a method having only academic inter- 
est; it is at the heart of some of the most powerful methods for solving the Schrédinger 
eigenvalue equation for complex quantum systems. 


Example 8.4.1 VariaATION METHOD 


Given a single-electron wave function (in three-dimensional space) of the form 


23 1/2 
v=(5) oo, (8.33) 


A 


where the factor (¢/z)*/* makes y normalized, it can be shown that, in units with the 
electron mass, its charge, and f (Planck’s constant divided by 27r) all set to unity (so-called 
Hartree atomic units), the quantum-mechanical kinetic energy operator has expectation 
value (y|T |W) = ¢7/2, and the potential energy of interaction between the electron and a 
fixed nucleus of charge +Z has (w|V|w) = —Z¢. For a one-electron atom with a nucleus 
of charge +Z at r = 0, the total energy will be less than or equal to the expectation value 
of the Hamiltonian H = T + V, given for the w of Eq. (8.33) as 
¢2 

(H) =(T)+(V)= gee (8.34) 
As is customary when the meaning is clear, we no longer explicitly show y within all 
the angle brackets. We can now optimize our upper bound to the lowest eigenvalue of H 
by minimizing the expectation value (H) with respect to the parameter ¢ in w. To do so, 


we set 
| é = 
| | Zo 0, 


leading to ¢ — Z =0, or ¢ = Z. This tells us that the wave function yielding the energy 
closest to the smallest eigenvalue is that with ¢ = Z, and the energy expectation value for 
this value of ¢ is Z7/2 — Z* = —Z?/2. 
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The result we have just found is exact, because, with malice aforethought and with 
appropriate knowledge, we chose a functional form that included the exact wave function. 
But now let us continue to a two-electron atom, taking a wave function of the form Y = 
w(1)wv (2), with both w of the same ¢ value. For this two-electron atom, the scalar product 
is defined as integration over the coordinates of both electrons, and the Hamiltonian is 
now H=T(1)+7(2)+V(1)+V(@)+U(, 2), where 7 (i) and V(i) denote the kinetic 
energy and the electron-nuclear potential energy for electron i; U(1, 2) is the electron- 
electron repulsion energy operator, equal in Hartree units to 1/712, where r12 is the distance 
between the positions of the two electrons. For the wave function in use here, the electron- 
electron repulsion has expectation value (U) = 5¢/8 and the expectation value (H) (for 
Z = 2, thereby representing the He atom) is 

a ae 276 
a= ZC ar - 

Minimizing (H) with respect to ¢, we obtain the optimum value ¢ = 27/16, and for this 
value of ¢ we have (H) = —(27/16)* = —2.8477 hartree. This is the best approximation 
available using a wave function of the form we chose. It cannot be exact, as the exact solu- 
tion for this system with two interacting electrons cannot be a product of two one-electron 
functions. We have therefore not included in our variational search the exact ground-state 
eigenfunction. A highly precise value of the smallest eigenvalue for this problem can only 
be obtained numerically, and in fact was produced by using the variation method with a 
trial function containing thousands of parameters and yielding a result accurate to about 
40 decimal places.* The value found here by very simple means is higher than the exact 
value, —2.9037--- hartree, by only about 2%, and already conveys much physically rele- 
vant information. If the two electrons did not interact, they would each have had an opti- 
mum wave function with ¢ = 2; the fact that the optimum ¢ is somewhat smaller shows 
that each electron partially screens the nucleus from the other electron. 

From the viewpoint of the mathematical method in use here, it is desirable to note that 
we did not need to assume any relation between the trial wave function and the exact form 
of the eigenfunction; the variational optimization adjusts the trial function to give an ener- 
getically optimum fit. The quality of the final result of course depends on the degree to 
which the trial function can mimic the actual eigenfunction, and trial functions are ordi- 
narily chosen in a way that balances inherent quality against convenience of use. | 


=¢7 





Exercises 


8.4.1 





A function that is normalized on the interval 0 < x < oo with an unweighted scalar 
product is 


W = 207)? xe7%, 
(a) Verify the normalization. 
(b) Verify that for this y, (x~!) =a. 


2C. Schwartz, Experiment and theory in computations of the He atom ground state, Int. J. Mod. Phys. E: Nuclear Physics 
15: 877 (2006). 
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(c) Verify that for this yr, (d*/dx*) = —a’. 


(d) Use the variation method to find the value of a that minimizes 


1@ 1 
base 


and find the minimum value of this expectation value. 





8.5 SUMMARY, EIGENVALUE PROBLEMS 


Because any Hermitian operator on a Hilbert space can be expanded in a basis and is there- 
fore mathematically equivalent to a matrix, all the properties derived for matrix eigenvalue 
problems automatically apply whether or not a basis-set expansion is actually carried out. 
It may be helpful to summarize some of those results, along with some that were developed 
in the present chapter. 


1. 


10. 


A second-order differential operator is Hermitian if it is self-adjoint in the differential- 
equation sense and the functions on which it operates are required to satisfy appropri- 
ate boundary conditions. In that event, the scalar product consistent with Hermiticity 
is an unweighted integral over the range between its boundaries. 

If a second-order differential operator is not self-adjoint in the differential-equation 
sense, it will nevertheless be Hermitian if it satisfies appropriate boundary condi- 
tions and if the scalar product includes the weight function that makes the original 
differential equation self-adjoint. 

A Hermitian operator on a Hilbert space has a complete set of eigenfunctions. Thus, 
they span the space and can be used as basis for an expansion. 

The eigenvalues of a Hermitian operator are real. 

The eigenfunctions of a Hermitian operator corresponding to different eigenvalues 
are orthogonal, using the appropriate scalar product. 

Degenerate eigenfunctions of a Hermitian operator can be orthogonalized using the 
Gram-Schmidt or any other orthogonalization process. 

Two operators have a common set of eigenfunctions if and only if they commute. 
An algebraic function of an operator has the same eigenfunctions as the original 
operator, and its eigenvalues are the corresponding function of the eigenvalues of the 
original operator. 

Eigenvalue problems involving a differential operator may be solved either by 
expressing the problem in any basis and solving the resulting matrix problem or by 
using relevant properties of the differential equation. 

The matrix representation of a Hermitian operator can be brought to diagonal form by 
a unitary transformation. In diagonal form, the diagonal elements are the eigenvalues, 
and the eigenvectors are the basis functions. The orthonormal eigenvectors are the 
columns of the unitary matrix U~! when a Hermitian matrix H is transformed to the 
diagonal matrix UHU~!. 
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11. Hermitian-operator eigenvalue problems which have a finite smallest eigenvalue 
may have their solutions approximated by the variation method, which is based on 
the theorem that for all members of the relevant Hilbert space, the expectation value 
of the operator will be larger than its smallest eigenvalue (or equal to it only if the 
Hilbert space member is actually a corresponding eigenfunction). 


Additional Readings 


Byron, F. W., Jr., and R. W. Fuller, Mathematics of Classical and Quantum Physics. Reading, MA: Addison- 
Wesley (1969). 


Dennery, P., and A. Krzywicki, Mathematics for Physicists. Reprinted. New York: Dover (1996). 
Hirsch, M., Differential Equations, Dynamical Systems, and Linear Algebra. San Diego: Academic Press (1974). 
Miller, K. S., Linear Differential Equations in the Real Domain. New York: Norton (1963). 


Titchmarsh, E. C., Eigenfunction Expansions Associated with Second-Order Differential Equations, Part 1. 2nd 
ed. London: Oxford University Press (1962). 


Titchmarsh, E. C., Eigenfunction Expansions Associated with Second-Order Differential Equations. Part 2. 
London: Oxford University Press (1958). 


CHAPTER 9 


PARTIAL DIFFERENTIAL 
EQUATIONS 


9.1 INTRODUCTION 


As mentioned in Chapter 7, partial differential equations (PDEs) involve derivatives with 
respect to more than one independent variable; if the independent variables are x and y, 
a PDE in a dependent variable g(x, y) will contain partial derivatives, with the mean- 
ing discussed in Eq. (1.141). Thus, dg/0x implies an x derivative with y held constant, 
d*y/dx? is the second derivative with respect to x (again keeping y constant), and we may 


also have mixed derivatives 
vp a (*) 
axdy dx \day)’ 


Like ordinary derivatives, partial derivatives (of any order, including mixed derivatives) 
are linear operators, since they satisfy equations of the type 





d[p(x, y) + be(x, y)] PRLaCed ipl Acme ly 


Ox Ox Ox 





Similar to the situation for ODEs, general differential operators, £, which may contain 
partial derivatives of any order, pure or mixed, multiplied by arbitrary functions of the 
independent variables, are linear operators, and equations of the form 


Lo(x,y)=F(x,y) 


are linear PDEs. If the source term F(x, y) vanishes, the PDE is termed homogeneous; 
if F(x, y) is nonzero, it is inhomogeneous. 
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Homogeneous PDEs have the property, previously noted in other contexts, that any 
linear combination of solutions will also be a solution to the PDE. This is the superposition 
principle that is fundamental in electrodynamics and quantum mechanics, and which also 
permits us to build specific solutions by the linear combination of suitable members of the 
set of functions constituting the general solution to the homogeneous PDE. 


Example 9.1.1. Various TyPEs OF PDEs 


Laplace V-v=0, linear, homogeneous 
Poisson V-w= fir), linear, inhomogeneous 
feeds ou VP : . 
Euler (inviscid flow) ae +u-Vu=—— nonlinear, inhomogeneous 
p 


Since the dynamics of many physical systems involve just two derivatives, for exam- 
ple, acceleration in classical mechanics, and the kinetic energy operator ~V~ in quantum 
mechanics, differential equations of second order occur most frequently in physics. Even 
when the defining equations are first order, they may, as in Maxwell’s equations, involve 
two coupled unknown vector functions (they are the electric and magnetic fields), and 
the elimination of one unknown vector yields a second-order PDE for the other (compare 
Example 3.6.2). 


Examples of PDEs 
Among the most frequently encountered PDEs are the following: 


1. Laplace’s equation, V7 = 0. 
This very common and very important equation occurs in studies of 


(a) electromagnetic phenomena, including electrostatics, dielectrics, steady currents, 
and magnetostatics, 

(b) hydrodynamics (irrotational flow of perfect fluid and surface waves), 

(c) heat flow, 

(d) gravitation. 


2. Poisson’s equation, V7~ = —p/eo. 

This inhomogeneous equation describes electrostatics with a source term —p/é0. 
3. Helmholtz and time-independent diffusion equations, V7 + k*y =0. 

These equations appear in such diverse phenomena as 





(a) elastic waves in solids, including vibrating strings, bars, membranes, 
(b) acoustics (sound waves), 

(c) electromagnetic waves, 

(d) nuclear reactors. 
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10 
4. The time-dependent diffusion equation, V7~ = = aa 
a 
ee a a 2 
5. The time-dependent classical wave equation, — a wv. 
c 
6. The Klein-Gordon equation, 077 = —71, and the corresponding vector equations in 


which the scalar function yf is replaced by a vector function. Other, more complicated 
forms are also common. 
7. The time-dependent Schrédinger wave equation, 


nh aw 
—-— Ww t+ Vy =ih— 
a 


and its time-independent form 


hi 
—-—Wy+Vy=Ev. 
2m 


8. The equations for elastic waves and viscous fluids and the telegraphy equation. 
9. Maxwell’s coupled partial differential equations for electric and magnetic fields and 
those of Dirac for relativistic electron wave functions. 


We begin our study of PDEs by considering first-order equations, which illustrate some 
of the most important principles involved. We then continue to classification and prop- 
erties of second-order PDEs, and a preliminary discussion of prototypical homogeneous 
equations of the different classes. Finally, we examine a very useful and powerful method 
for obtaining solutions to homogeneous PDEs, namely the method of separation of 
variables. 

This chapter is mainly devoted to general properties of homogeneous PDEs; full detail 
on specific equations is for the most part postponed to chapters that discuss the spe- 
cial functions involved. Questions arising from the extension to inhomogeneous PDEs 
(i.e., problems involving sources or driving terms) are also deferred, mainly to later chap- 
ters on Green’s functions and integral transforms. 

Occasionally, we encounter equations of higher order. In both the theory of the slow 
motion of a viscous fluid and the theory of an elastic body we find the equation 


(V?)?>w =0. 


Fortunately, these higher-order differential equations are relatively rare and are not dis- 
cussed here. Sometimes, particularly in fluid mechanics, we encounter nonlinear PDEs. 


9.2 FIRST-ORDER EQUATIONS 


While the most important PDEs arising in physics are linear and second order, many 
involving three spatial variables plus possibly a time variable, first-order PDEs do arise 
(e.g., the Cauchy-Riemann equations of complex variable theory). Part of the motivation 
for studying these easily solved equations is that the study provides insights that apply also 
to higher-order problems. 
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Characteristics 


Let us start by considering the following homogeneous linear first-order equation in two 
independent variables x and y, with constant coefficients a and b, and with dependent 
variable g(x, y): 

fp se ey! at (9.1) 

ox dy 

This equation would be easier to solve if we could rearrange it so that it contained only one 
derivative; one way to do this is would be to rewrite our PDE in terms of new coordinates 
(s, f) such that one of them, say s, is such that (0/0s); would expand into the linear combi- 
nation of 0/dx and 0/dy in the original PDE, while the other new coordinate, f, is such that 
(0/dt)s does not occur in the PDE. It is easily verified that definitions of s and tf consistent 
with these objectives for the PDE in Eq. (9.1) are s = ax + by and t = bx — ay. To check 
this, write g(x, y) = y(x(s, t), y(s, t)) = G(s, t), and we can verify that 


(az), ~e(as),+°Gr), Ge), -2(),-«G), 
ax}, ds}, ot}, dy), Os }, at], 


dg dg 4 4: a@ 
—+b—= b*)—. 
"ax = dy or 9s 


so 


We see that the PDE does not contain a derivative with respect to t. Since our PDE now 
has the simple form 


96 
(a +b)" =0, 
Os 
it clearly has solution 
0o(s,t)= f(t), with f(t) completely arbitrary. (9.2) 
In terms of the original variables, 


g(x, y) = f(bx — ay), (9.3) 


where we again stress that f(t) is an arbitrary function of its argument. 
Checking our work to this point, we note that 


git OX —®) i pos Ox — ay) 
ox dy 


Since the satisfaction of this equation does not depend on the properties of the function f, 
we verify that g(x, y) as given in Eq. (9.3) is a solution of our PDE, irrespective of the 
choice of the function /. In fact, it is the general solution of our PDE. 

It is useful to visualize the significance of what we have just observed. Note that holding 
t = bx — ay to a fixed value defines a line in the xy plane on which our solution ¢ is con- 
stant, with individual points on this line corresponding to different values of s = ax + by. 
In addition, we observe that the lines of constant s are orthogonal to those of constant ft, and 
that s has the same coefficients as the derivatives in the PDE. The general solution to our 
PDE can thus be characterized as independent of s and with arbitrary dependence on t. 


Lo= 





=abf'(bx —ay)+b [—af’ (bx - ay)| =0. 
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The curves of constant ¢ are called characteristic curves, or more frequently just char- 
acteristics of our PDE. An alternative and insightful way of describing the characteristic 
curves is to observe that they are the stream lines (flow lines) of s. Put another way, they 
are the lines that are traced out as the value of s is changed, keeping f constant. The char- 
acteristic can also be characterized by its slope, 

dy b : 
—=-, for £ in Eq. (9.1). (9.4) 
dx a 

For our present first-order PDE, the solution g is constant along each characteristic. We 
shall shortly see that more general PDEs can be solved using ODE methods on charac- 
teristic lines, a feature that causes it to be said that PDE solutions propagate along the 
characteristics, giving further significance to the notion that in some sense these are lines 
of flow. In the present problem this translates into the statement that if we know ¢ at any 
point on a characteristic, we know it on the entire characteristic line. 

The characteristics have one additional (but related) property of importance. Ordinarily, 
if a PDE solution g(x, y) is specified on a curve segment (a boundary condition), one 
can deduce from it the values of the solution at nearby points that are not on the curve. If 
one introduces a Taylor expansion about some point (xo, yo) on the curve (thereby tacitly 
assuming that there are no singularities that invalidate the expansion), the value of ¢ at a 


nearby point (x, y) will be given by 
Ig(Xo; Yo) dg(X0; Yo) 
OC, ¥) = (%0,.90) + FEA = 29) FPL). 5) 


To use Eq. (9.5), we need values of the derivatives of y. To obtain these derivatives, note 
the following: 





e The specification of g on a given curve, with the curve parametrically described by 
x(1), y(1), means that the curve direction, i.e., dx/dl and dy/dl, is known, as is the 
derivative of g along the curve, namely 

dp  odydx  dgpdy 

dl dx dl dy dl’ 
Equation (9.6) therefore provides us with a linear equation satisfied by the two deriva- 
tives 0g/dx and dg/dy. 





(9.6) 


e The PDE supplies a second linear equation, in this case 


dp 0 
—+b—=0. 9.7 
‘ ox dy wy 
e Providing that the determinant of their coefficients is not zero, we can solve Eqs. (9.6) 
and (9.7) for dg/dx and dg/dy at (xo, yo) and therefore evaluate the leading terms of 
the Taylor series for g(x, ie The determinant of coefficients of Eqs. (9.6) and (9.7) 
takes the form 


| ge dy 
D=|dl = dl|=b—-a—. 
dl dl 

a b 





! The linear terms are all that are necessary; one can choose x and y close enough to (xo, yo) that second- and higher-order terms 
can be made negligible relative to those retained. 
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Now we make the observation that if g was specified along a characteristic (for which 
t = bx — ay =constant), we have 
dx dy 
bdx—ady=0, or b—-—a—=0O, 
am al 
so that D = 0 and we cannot solve for the derivatives of gy. Our conclusions relative to 
characteristics, which can be extended to more general equations, are: 


1. If the dependent variable y of the PDE in Eq. (9.1) is specified along a curve (i.e., 9 
has a boundary condition specified on a boundary curve), this fixes the value of 
at a point of each characteristic that intersects the boundary curve, and hence at all 
points of each such characteristic; 

2. Ifthe boundary curve is along a characteristic, the boundary condition on it will ordi- 
narily lead to inconsistency, and therefore, unless the boundary condition is redundant 
(i.e., coincidentally equal everywhere to the solution constructed from the value of 
at any one point on the characteristic), the PDE will not have a solution; 

3. If the boundary curve has more than one intersection with the same characteristic, 
this will usually lead to an inconsistency, as the PDE may not have a solution that is 
simultaneously consistent with the values of y at both intersections; and 

4. Only if the boundary curve is not a characteristic can a boundary condition fix the 
value of @ at points not on the curve. Values of y specified only on a character- 
istic of the PDE provide no information as to the value of ~ at points not on that 
characteristic. 


In the above example, the argument tf of the arbitrary function f was a linear combina- 
tion of x and y, which worked because the coefficients of the derivatives in the PDE were 
constants. If these coefficients were more general functions of x and y, the foregoing type 
of analysis could still be carried out, but the form of t would have to be different. This 
more complicated case is illustrated in Exercises 9.2.5 and 9.2.6. 


More General PDEs 
Consider now a first-order PDE of a form more general than Eq. (9.1), 
dp 99 
Lo =a—+b— + q(x, yp =F, y). (9.8) 
Ox dy 


We may identify its characteristic curves just as before, which amounts to making a trans- 
formation to new variables s = ax + by, t = bx — ay, in terms of which our PDE becomes, 
compare Eq. (9.5), 


(a? +b?) (=) + G(s, )¢ = F(s,t). (9.9) 


Here q(s, t) is obtained by converting q(x, y) to the new coordinates: 


as + bt “) 


i(s.=4q( 5.55 
art) (33 a? + b2 
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and F is related in a similar fashion to F. Equation (9.9) is really an ODE in s (containing 
what can be viewed as a parameter, ft), and its general solution can be obtained by the usual 
procedures for solving ODEs. 


Example 9.2.7. = ANOTHER FIRST-ORDER PDE 


Consider the PDE 


a i) 

PP t yp =0. 

ox oy 

Applying a transformation to the characteristic direction tf = x — y and the direction 
orthogonal thereto s = x + y, our PDE becomes 


i) 
7 are ee 
Os 
This equation separates into 
d 
2° + sds=0, 
gp 
with general solution 
s2 214 
Ing=-7+C, or gae* "SO, 


where f(t), originally exp[C(t)], is completely arbitrary. One can simplify the result 
slightly by noting that s*/4 = 17/4 + xy; then exp(—1?/4) can be absorbed into f(t), 
leaving the compact result (in terms of x and y) 


g(x,y)=e ~ f(x—y), (Cf arbitrary). 


More Than Two Independent Variables 


It is useful to consider how the concept of characteristic can be generalized to PDEs with 
more than two independent variables. Given the three-dimensional (3-D) differential form 
dg 99 | dy 
Sf ap ee, 

. ax - dy 7 Oz 
we apply a transformation to convert our PDE to the new variables s = ax + by + cz, 
t=ajx +a2y +037, u = Bix + Boy + §3z, with a; and §; such that (s,t,u) form an 
orthogonal coordinate system. Then our 3-D differential form is found equivalent to 


a 
(a? +b7 +4 cy SF 
os 


and the stream lines of s (those with t and u constant) are our characteristics, along which 
we can propagate a solution g by solving an ODE. Each characteristic can be identified by 
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its fixed values of ¢ and uw. For the 3-D analog of Eq. (9.1), 
dp 09 | ay 
b =0, 9.10 
es ax me dy oe dz en 





we have 
r) 
(a2 +b? +02) =0, 
os 


with solution g = f(t, u), with f a completely arbitrary function of its two arguments. 
Consider next an attempt to solve our 3-D PDE subject to a boundary condition fixing 

the values of the PDE solution g on a surface. If the characteristic through a point on 
the surface lies in the surface, we have a potential inconsistency between the boundary 
condition and the solution propagated along the characteristic. We are then also unable to 
extend g away from the boundary surface because the data on the surface is insufficient 
to yield values of the derivatives that are needed for a Taylor expansion. To see this, note 
that the derivatives 0g/dx, dy/dy, and dgy/dz can only be determined if we can find 
two directions (parametrically designated / and /’) such that we can solve simultaneously 
Eq. (9.10) and 

dp dgdx dgdy dgpdz 

al ax dls ay dl_~—s az dll’ 

ag dgdx dpdy dgdz 

al’ ax dl! ay dl! Az dl!" 


A solution can be obtained only if 








dx dy dz 

di dl di 
D=|dx dy dz|#0. 

dl’ dl! dl 

a b c 


If a characteristic, with dx/dl” =a, dy/dl" = b, and dz/dl"” =, lies in the two- 
dimensional (2-D) surface, there will only be one further linearly independent direction 
1, and D will necessarily be zero. 

Summarizing, our earlier observations extend to the 3-D case: 


A boundary condition is effective in determining a unique solution to a first-order PDE 
only if the boundary does not include a characteristic, and inconsistencies may arise if 
a characteristic intersects a boundary more than once. 





Exercises 
Find the general solutions of the PDEs in Exercises 9.2.1 to 9.2.4. 
9.2.1 = + ae + 2x —y)py =0. 
9.2.2 a eee 


Ox dy 





9.3 Second-Order Equations 409 





a: a 
ax dy Oz 
ow daw dw 
2.4 = : 
: ax = dy a Oz —s 
9.2.5 (a) Show that the PDE 
ay ow 
a == 
- Ox “ dy 


can be transformed into a readily soluble form by writing it in the new variables 
u=xy,v=x* — y’, and find its general solution. 
(b) Discuss this result in terms of characteristics. 


9.2.6 Find the general solution to the PDE 
ow ow 
x— —y—=0 
Ox oy 


Hint. The solution to Exercise 9.2.5 may provide a suggestion as to how to proceed. 


9.3. SECOND-ORDER EQUATIONS 
Classes of PDEs 


We consider here extending the notion of characteristics to second-order PDEs. This can 
sometimes be done in a useful fashion. As a preliminary example, consider the following 
homogeneous second-order equation 


29°p(x,y) 97? p(x,y) | 
a Cc = 


ait 
ax2 dy2 ? aa 





where a and c are assumed to be real. This equation can be written in the factored form 


0 0 a 0 
=0, 9.12 
Jas tes Ee cz |¢ re) 
and, since the two operator factors commute, we see that Eq. (9.12) will be satisfied if g is 
a solution to either of the first-order equations 


dy 0 dy 0g 
— —=0 or a—-—c—=0. 9.13 
. ox 0 dy 7 ox . dy ( ) 
However, these first-order equations are of just the type discussed in the preceding subsec- 
tion, so we can identify their respective general solutions as 





gi(x,y) = flex —ay), 92(x, y) =g(cx +ay), (9.14) 


where f and g are arbitrary (and totally unrelated) functions. Moreover, we can iden- 
tify the stream lines of ax + cy and ax — cy as characteristics, with implications as to 
the effectiveness and possible consistency of boundary conditions. For some PDEs with 
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second derivatives as given in Eq. (9.11), it will also be practical to propagate solutions 
along the characteristics. 
Look next at the superficially similar equation 


a20(x, a7 a(x, 
Pe) g(x eee: g(x We 








0, 9.15 
ax? dy? a) 
with a and c again assumed to be real. If we factor this, we get 
i) a i) a 
a—+ic a ic g=0. (9.16) 
ox dy ox dy 


This factorization is of less practical value, as it leads to complex characteristics, which 
do not have an obvious relevance to boundary conditions. In addition, propagation along 
such characteristics does not provide a solution to the PDE for physically relevant (i.e., 
real) coordinate values. 

It is customary to identify second-order PDEs as hyperbolic if they are of (or can be 
transformed into) the form given in Eq. (9.11), with real values of a and c. PDEs that are of 
(or can be transformed into) the form given in Eq. (9.15) are called elliptic. The designation 
is useful because it correlates with the existence (or nonexistence) of real characteristics, 
and therefore with the behavior of the PDE relative to boundary conditions, with further 
implications as to convenient methods for solving the PDE. The terms elliptic and hyper- 
bolic have been introduced based on an analogy to quadratic forms, where a7x* +c? y* =d 
is the equation of an ellipse, while a*x* — c*y* = d is that of a hyperbola. 

More general PDEs will have second derivatives of the differential form 








ao a ay 
L= 2b ‘ 9.17 
9 9x2 7 axdy ery) er) 
The form in Eq. (9.17) has the following factorization: 
/p2 /p2 
fie b+~b*—ac 0 Paes b—wb*—ac 0 ie (9.18) 
cl/2 Ox dy cl/2 Ox dy 


Equation (9.18) is easily verified by expanding the product. The equation also shows 
that the characteristics of Eq. (9.17) are real if and only if b> — ac > 0. This quantity 
is well known from elementary algebra, being the discriminant of the quadratic form 
at* + 2bt + c. If b* —ac > 0, the two factors identify two linearly independent real charac- 
teristics, as were found for the prototype hyperbolic PDE discussed in Eqs. (9.11) to (9.14). 
If b? — ac < 0, the characteristics will, as for the prototype elliptic PDE in Eqs. (9.15) 
and (9.16), form a complex conjugate pair. We now have, however, one new possibility: 
If b* — ac = 0 (a case that for quadratic forms is that of a parabola), we have a PDE that 
has exactly one linearly independent characteristic; such PDEs are termed parabolic, and 
the canonical form adopted for them is 


dp a 
ax dy?" 
If the original PDE lacked a 0/0x term, it would in effect be an ODE in y that depends 


on x only parametrically and need not be considered further in the context of methods for 
PDEs. 


(9.19) 
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To complete our discussion of the second-order form in Eq. (9.17), we need to show 
that it can be transformed into the canonical form for the PDE of its classification. For this 
purpose we consider the transformation to new variables é, n, defined as 


g=cl?x—c by, n=c ly. (9.20) 


By systematic application of the chain rule to evaluate 07/dx7, 07/dxdy, and d7/dy”, it 
can be shown that 

Py dy 

moras Aare 9.21 
ag2 an2 ( ) 
Verification of Eq. (9.21) is the subject of Exercise 9.3.1. 

Equation (9.21) shows that the classification of our PDE remains invariant under trans- 
formation, and is hyperbolic if b* — ac > 0, elliptic if b? — ac < 0, and parabolic if 
b* — ac =0. Perhaps better seen from Eq. (9.18), we see that the stream lines of the char- 
acteristics have slope 


L=(ac—b*) 


d 
— Z (9.22) 
dx bev b2 —ac 








More than Two Independent Variables 


While we will not carry out a full analysis, it is important to note that many problems 
in physics involve more than two dimensions (often, three spatial dimensions or several 
spatial dimensions plus time). Often, the behavior in the multiple spatial dimensions is 
similar, and we apply the terms hyperbolic, elliptic, and parabolic in a way that relates the 
spatial to the time derivatives when the latter occur. Thus, these equations are classified as 
indicated: 


Laplace equation Vy =0 elliptic 
Poisson equation Vy=-p /€0 elliptic 
: 2 1 ey : 
Wave equation Vv=s— hyperbolic 
c? or 
bas ow 2 
Diffusion equation e= Vw parabolic 


The specific equations mentioned here are very important in physics and will be further 
discussed in later sections of this chapter. These examples, of course, do not represent the 
full range of second-order PDEs, and do not include cases where the coefficients in the 
differential operator are functions of the coordinates. In that case, the classification into 
elliptic, hyperbolic, and parabolic is only local; the class may change as the coordinates 
vary. 


Boundary Conditions 


Usually, when we know a physical system at some time and the law governing the phys- 
ical process, then we are able to predict the subsequent development. Such initial val- 
ues are the most common boundary conditions associated with ODEs and PDEs. Finding 
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solutions that match given points, curves, or surfaces corresponds to boundary value prob- 
lems. Solutions usually are required to satisfy certain imposed (for example, asymptotic) 
boundary conditions. These boundary conditions ordinarily take one of three forms: 


1. Cauchy boundary conditions. The value of a function and normal derivative speci- 
fied on the boundary. In electrostatics this would mean g, the potential, and E,,, the 
normal component of the electric field. 

2. Dirichlet boundary conditions. The value of a function specified on the boundary. 
In electrostatics, this would mean the potential g. 

3. Neumann boundary conditions. The normal derivative (normal gradient) of a func- 
tion specified on the boundary. In the electrostatic case this would be E,, and therefore 
o,, the surface charge density. 


Because the three classes of second-order PDEs have different patterns of character- 
istics, the boundary conditions needed to specify (in a consistent way) a unique solution 
will depend on the equation class. An exact analysis of the role of boundary conditions is 
complicated and beyond the scope of the present text. However, a summary of the relation 
of these three types of boundary conditions to the three classes of 2-D partial differential 
equations is given in Table 9.1. For a more extended discussion of these partial differ- 
ential equations the reader may consult Morse and Feshbach, Chapter 6 (see Additional 
Readings). 

Parts of Table 9.1 are simply a matter of maintaining internal consistency or of common 
sense. For instance, for Poisson’s equation with a closed surface, Dirichlet conditions lead 


Table 9.1 Relation between PDE and Boundary Conditions 





Boundary Class of Partial Differential Equation 
Conditions 





Elliptic 


Hyperbolic 


Parabolic 





Cauchy 
Open surface 


Closed surface 


Dirichlet 
Open surface 


Closed surface 
Neumann 


Open surface 


Closed surface 


Laplace, Poisson 
in (x, y) 


Unphysical results 
(instability) 


Too restrictive 


Insufficient 


Unique, stable 
solution 


Insufficient 


Unique, stable 
solution 


Wave equation in 


(x,t) 


Unique, stable 
solution 


Too restrictive 


Insufficient 


Solution not unique 


Insufficient 


Solution not unique 


Diffusion equation 
in (x, f) 


Too restrictive 


Too restrictive 


Unique, stable 
solution in one 
direction 


Too restrictive 


Unique, stable 
solution in one 
direction 


Too restrictive 
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to a unique, stable solution. Neumann conditions, independent of the Dirichlet conditions, 
likewise lead to a unique stable solution independent of the Dirichlet solution. There- 
fore, Cauchy boundary conditions (meaning Dirichlet plus Neumann) could lead to an 
inconsistency. 

The term boundary conditions includes as a special case the concept of initial condi- 
tions. For instance, specifying the initial position x9 and the initial velocity vg in some 
dynamical problem would correspond to the Cauchy boundary conditions. Note, how- 
ever, that an initial condition corresponds to applying the condition at only one end of 
the allowed range of the (time) variable. 

Finally, we note that Table 9.1 oversimplifies the situation in various ways. For example, 
the Helmholtz PDE, 


Vwtkw=0, 


(which could be thought of as the reduction of a parabolic time-dependent equation to its 
spatial part) has solution(s) for Dirichlet conditions on a closed boundary only for certain 
values of its parameter k. The determination of k and the characterization of these solutions 
is an eigenvalue problem and is important for physics. 


Nonlinear PDEs 


Nonlinear ODEs and PDEs are a rapidly growing and important field. We encountered 
earlier the simplest linear wave equation, 
oy aw 
ot ie Ox 


as the first-order PDE of the wavefronts of the wave equation. The simplest nonlinear wave 
equation, 


=0, 


vt cy = 0, (9.23) 


results if the local speed of propagation, c, is not constant but depends on the wave yw. 
When a nonlinear equation has a solution of the form w(x, t) = Acos(kx — wt), where 
w(k) varies with k so that w’(k) 4 0, then it is called dispersive. Perhaps the best-known 
nonlinear dispersive equation is the Korteweg-deVries equation, 


wet a 


which models the lossless propagation of shallow water waves and other phenomena. It is 
widely known for its soliton solutions. A soliton is a traveling wave with the property of 
persisting through an interaction with another soliton: After they pass through each other, 
they emerge in the same shape and with the same velocity and acquire no more than a 
phase shift. Let w(é = x — ct) be such a traveling wave. When substituted into Eq. (9.24) 
this yields the nonlinear ODE 





=; (9.24) 


wy - ot +o ae 


rar (9.25) 
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which can be integrated to yield 
dw Ww? 
— =cy-—. 9.26 
qv (9.26) 
There is no additive integration constant in Eq. (9.26), because the solution must be such 
that d?w/dé* — 0 with yw — 0 for large €. This causes yf to be localized at the character- 
istic € = 0, or x = ct. Multiplying Eq. (9.26) by dw/dé and integrating again yields 
dyy oo, 
—)= -—, 9.27 
(SE) =r-4 0.27) 
where dy/dé — 0 for large €. Taking the root of Eq. (9.27) and integrating again yields 
the soliton solution 
Wor et) = (9.28) 
x-—cthh= : : 
cosh? (Z/ce(x - ct)) 
Exercises 


2 


Show that by making a change of variables to € = c!/*x — c7!/*by, n =c7'/"y, the 


operator £ of Eq. (9.18) can be brought to the form 
2 a2 


0 
2 


9.4 SEPARATION OF VARIABLES 


Partial differential equations are clearly important in physics, as evidenced by the PDEs 
listed in Section 9.1, and of equal importance is the development of methods for their 
solution. Our discussion of characteristics has suggested an approach that will be useful 
for some problems. Other general techniques for solving PDEs can be found, for example, 
in the books by Bateman and by Gustafson listed in the Additional Readings at the end of 
this chapter. However, the technique described in the present section is probably that most 
widely used. 

The method developed in this section for solution of a PDE splits a partial differential 
equation of n variables into n ordinary differential equations, with the intent that an overall 
solution to the PDE will be a product of single-variable functions which are solutions to 
the individual ODEs. In problems amenable to this method, the boundary conditions are 
usually such that they separate at least partially into conditions that can be applied to the 
separate ODEs. 

Further discussion of the method depends on the nature of the problem we seek to solve, 
so we now make the observation that PDEs occur in physics in two contexts, either as 


e An equation with no unknown parameters for which there is expected to be a unique 
solution consistent with the boundary conditions (typical example: Laplace equation 
for the electrostatic potential with the potential specified on the boundary), or 
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e An eigenvalue problem which will have solutions consistent with the boundary con- 
ditions only for certain values of an embedded but initially unknown parameter (the 
eigenvalue). 


In the first of these two cases, the unique solution is typically approached by first applying 
boundary conditions to the separate ODEs to specialize their solutions as much as possible. 
The solution is at this point normally not unique, and we have a (usually infinite) number 
of product solutions that satisfy the boundary conditions thus far applied. We then regard 
these product solutions as a basis that can be used to form an expansion that satisfies the 
remaining boundary condition(s). We illustrate with the first and fourth examples of this 
section. 

In the second case identified above, we typically have homogeneous boundary condi- 
tions (solution equal to zero on the boundary), and in favorable situations can satisfy all 
the boundary conditions by imposing them on the separate ODEs. At this point we usually 
find that each product solves our PDE with a different value of its embedded parameter, 
so that we are obtaining eigenfunctions and eigenvalues. This process is illustrated in the 
second and third examples of the present section. 

The method of separation of variables proceeds by dividing the PDE into pieces each 
of which can be set equal to a constant of separation. If our PDE has n independent 
variables, there will be n — 1 independent separation constants (though we often prefer 
a more symmetric formulation with n separation constants plus an equation connecting 
them). The separation constants may have values that are restricted by invoking boundary 
conditions. 

To get a broad understanding of the method of separation of variables, it is useful to see 
how it is carried out in a variety of coordinate systems. Here we examine the process in 
Cartesian, cylindrical, and spherical polar coordinates. For application to other coordinate 
systems we refer the reader to the second edition of this text. 


Cartesian Coordinates 


In Cartesian coordinates the Helmholtz equation becomes 


ew eh ey 
kw =0, 9.29 
ax2 dy? ™ dz? re ve) 





using Eq. (3.62) for the Laplacian. For the present, let k* be a constant. As stated in the 
introductory paragraphs of this section, our strategy will be to split Eq. (9.29) into a set of 
ordinary differential equations. To do so, let 


W(x, y,z) = X(*)Y()Z@) (9.30) 


and substitute back into Eq. (9.29). How do we know Eq. (9.30) is valid? When the dif- 
ferential operators in various variables are additive in the PDE, that is, when there are no 
products of differential operators in different variables, the separation method has a chance 
to succeed. For success, it is usually also necessary that at least some of the boundary con- 
ditions separate into conditions on the separate factors. At any rate, we are proceeding in 
the spirit of let’s try and see if it works. If our attempt succeeds, then Eq. (9.30) will be 
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justified. If it does not succeed, we shall find out soon enough and then we can try another 
attack, such as Green’s functions, integral transforms, or brute-force numerical analysis. 
With w assumed given by Eq. (9.30), Eq. (9.29) becomes 
yvzi * axze axvtZ +hXYZ=0 (9.31 
dx? dy? dz? - on 


Dividing by yw = XY Z and rearranging terms, we obtain 
1@X_ op _ I1d@y 142% 
X dx? Ydy2  Z dz" 
Equation (9.32) exhibits one separation of variables. The left-hand side is a function of 
x alone, whereas the right-hand side depends only on y and z and not on x. But x, y, 
and z are all independent coordinates. The equality of two sides that depend on different 


variables can only be attained if each side must be equal to the same constant, a constant 
of separation. We choose” 








(9.32) 








1 d’x 
Sap, 9.33 
X dx? ( ) 
ley lez 
° =—[? (9.34) 
Ydy? Z dz? 
Now, turning our attention to Eq. (9.34), we obtain 
tay 1@Z 
=k? +1? 9.35 
Y dy? Bi Z dz aa 


and a second separation has been achieved. Here we have a function of y equated to a 

function of z. We resolve it, as before, by equating each side to another constant of sepa- 
: 2 

ration, —m*, 





ia’y 
——_ = —m?, (9.36) 
Y dy? 
1d?Z 
P+P =—m?* 9.37 
as Z dz? O37) 
The separation is now complete, but to make the formulation more symmetrical, we will set 
az. 5 
and then consistency with Eq. (9.37) leads to the condition 
P4m +n? =k. (9.39) 


Now we have three ODEs, Eqs. (9.33), (9.36), and (9.38), to replace Eq. (9.29). Our 
assumption, Eq. (9.30), has succeeded in splitting the PDE; if we can also use the fac- 
tored form to satisfy the boundary conditions, our solution of the PDE will be complete. 


2The choice of sign for separation constants is completely arbitrary, and will be fixed in specific problems by the need to satisfy 
specific boundary conditions, and particularly to avoid the unnecessary introduction of complex numbers. 
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It is convenient to label the solution according to the choice of our constants /, m, and n; 
that is, 


Wimn(%, Ys Z) = X14) Yin (y) Zn (2). (9.40) 


Subject to the boundary conditions of the problem being solved and to the condition 
k? =P? +m? +n’, we may choose /, m, and n as we like, and Eq. (9.40) will still be a 
solution of Eq. (9.29), provided only that X;(x) is a solution of Eq. (9.33), and so on. 
Because our original PDE is homogeneous and linear, we may develop the most general 
solution of Eq. (9.29) by taking a linear combination of solutions Yj), 


v= YS aim Wimn» (9.41) 


I,m 


where it is understood that n will be given a value consistent with Eq. (9.39) and with the 
values of / and m. 

Finally, the constant coefficients aj, must be chosen to permit W to satisfy the boundary 
conditions of the problem, leading usually to a discrete set of values /, m. 

Reviewing what we have done, it can be seen that the separation into ODEs could still 
have been achieved if k* were replaced by any function that depended additively on the 
variables, i.e., if 


k? —> f(x) + a(y) +A(2). 


A case of practical importance would be the choice k7 —> C(x? + y* + 2”), leading to 
the problem of a 3-D quantum harmonic oscillator. Replacing the constant term k* by 
a separable function of the variables will, of course, change the ODEs we obtain in the 
separation process and may have implications relative to the boundary conditions. 


Example 9.4.1 LAPLACE EQUATION FOR A PARALLELEPIPED 


As a concrete example we take Eq. (9.29) with k = 0, which makes it a Laplace equation, 
and ask for its solution in a parallelepiped defined by the planar surfaces x = 0, x =c, 
y=0, y=c, z=0, z=L, with the Dirichlet boundary condition y = 0 on all the bound- 
aries except that at z = L; on that boundary w is given the constant value V. See Fig. 9.1. 
This is a problem in which the PDE contains no unknown parameters and should have a 
unique solution. 

We expect a solution of the generic form given by Eq. (9.41), with Winn given by 
Eq. (9.40). To proceed further, we need to develop the actual functional forms of X (x), 
Y(y), and Z(z). For X and Y, the ODEs, written in conventional form, are 





X"=-PX, Y"=-m’y 
with general solutions 
X= Asinlx+ Bcoslx, Y=A'sinmy+ B'cosmy. 


We could have written X and Y as complex exponentials, but that choice would be less 
convenient when we consider the boundary conditions. To satisfy the boundary condition 
at x = 0, we set X(0) = 0, which can be accomplished by choosing B = 0; to satisfy 
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FiGURE 9.1 Parallelepiped for solution of Laplace equation. 


the boundary condition at x = c, we set X(c) = 0, which causes us to choose / such that 
lc = Az, where 4 must be a nonzero integer. Without loss of generality, we can restrict 4 to 
positive values, as —X and X are linearly dependent. Moreover, we can include whatever 
scale factor is ultimately needed in our solution for Z(z), so we may set A = 1. Similar 
remarks apply to the solution Y(y), so our solutions for X and Y take the final form 





. (anmx ; 
x(a) =sin(2*), ¥,(y) = sin ( (9.42) 
Cc 
with A = 1,2,3,... and w= 1, 2,3,.... 
Next we consider the ODE for Z. It must be solved with a value of n”, calculated from 
Eq. (9.39) with k = 0 as 


sie 


2 
1 
n=—-—(" +p’). 
Cc 
This equation suggests that n will be imaginary, but that is unimportant here. Returning to 
the ODE for Z, we now see that it becomes 
2: 
a2 
ZH + (A? 4 w)Z, 
c 
and the general solution for Z(z) for given A and yw is then easily identified as 
B . us 
Zau(z)= Ae + BePH=, with pry = —VA2 4 p?. (9.43) 
Cc 


We now specialize Eq. (9.43) in a way that makes Z),,(0) = 0 and Z),,(L) = V. Noting 
that sinh(,,,z) is a linear combination of e?*#* and e~”“*, we write 


sinh(pay.2) 


ipso 
n= anos ED) 


(9.44) 


At this point, we have made choices that cause all the boundary conditions to be satisfied 
except that at z = L, and we are now ready to select the coefficients a,,, as required by the 
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remaining boundary condition, which because of Eq. (9.44) corresponds to 
1 Xr 
penta yaw sin (=) sin") =o, (9.45) 


The symmetry of this expression suggests that we write a),, = b,b,, and find the coeffi- 
cients b, from the equation 


re 
oh sin (=) =4, (9.46) 


Because the sine functions in Eq. (9.46) are the eigenfunctions of the one-dimensional 
(1-D) equation for X, which is a Hermitian eigenproblem, they form an orthogonal set on 
the interval (0, c), so the b, can be computed by the following formulas: 


c 


(sin (=) . [ sinaxye)ax 
c 





_ 0 
_ (Amx\| . (Amx r 
sin ss sin io / sin? (Asx /c) dx 
0 

4 
=—, hi)odd, 

At 
=0, A even, 


and our complete solution for the potential in the parallelepiped becomes 


_ (Amx\ . (pay sinh(p,,z) 
Wir,y.2)=V bab, sin( ) 9.47 
(x, y, z) 2 . «sin ( : ) sin( eo) aie) (9.47) 





As briefly mentioned earlier, PDEs also occur as eigenvalue problems. Here is a simple 
example. 


Example 9.4.2 Quantum PARTICLE IN A BOX 


We consider a particle of mass m trapped in a box with planar faces at x = 0, x =a, y =0, 
y=b,z=0,z=c. The quantum stationary states of this system are the eigenfunctions of 
the Schrédinger equation 


7 sew tx, PDSEVG.YD: (9.48) 


where this PDE is subject to the Dirichlet boundary condition y = 0 on the walls of the 
box. We identify E as the stationary-state energy (the eigenvalue), in a system of units with 
m = h = 1. This isa Helmholtz equation with the new wrinkle that E is not initially known. 
The boundary conditions are such that this PDE has no solution except for a set of discrete 
values of E. We want to find both those values and the corresponding eigenfunctions. 
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Separating the variables in Eq. (9.48) by assuming a solution of the form Eq. (9.30), the 
PDE becomes 





x" y” zl 
(F+T+F) a2, (9.49) 


and the separation yields 


xX” _ _p 
xX ’ 


with solution X = A sinlx + Bcoslx. 


After applying the boundary conditions at x = 0 and x =a we get (scaling to A = 1) 
. AX 
x,=sin(*), A=1,2,3,..., so l=Am/a. (9.50) 
a 


Because the X equation is a 1-D Hermitian eigenvalue problem, these functions X (x) are 
orthogonal on 0 < x <a. 

Similar processing of the Y and Z equations, with separation constants —m? and —n?, 
yields 


Y,=sin(—), 21,23, 2c. 80 m= pid, 
(9.51) 
: VITZ 
Z, =sin(—), v=1,2,3,..., son=vz/c, 
c 


yielding two additional 1-D eigenvalue problems. 
Replacing X”/X,Y"/Y, Z"/Z in Eq. (9.49), respectively, by —I*, —m?, —n?, and then 
evaluating these quantities from Eqs. (9.50) and (9.51), we have 





2 2 2 2 
2 2 2 _ Xx a v 
+m*+n°=2E, or E= 5 (5+5 +5), (9.52) 


with 4, y, and wy arbitrary positive integers. The situation is quite different from our 
solution, Example 9.4.1, of the Laplace equation. Instead of a unique solution we have 
an infinite set of solutions, corresponding to all positive integer triples (A, , v), each with 
its own value of E. Making the observation that the differential operator on the left-hand 
side of Eq. (9.47) is Hermitian in the presence of the chosen boundary conditions, we have 
found a complete orthogonal set of its eigenfunctions. The orthogonality is obvious, as it 
can be confirmed from the orthogonality of the X,, Y,,, and Z, on their respective 1-D 
intervals. Because we set the coefficients of all the sine functions to unity, our overall 
eigenfunctions are not normalized, but we can easily normalize them if we so choose. 

We close this example with the observation that this boundary-value problem will not 
have a solution for arbitrarily chosen values of E,, as the E values must satisfy Eq. (9.52) 
with integer values of A, 1, and v. This will cause the E values of the problem solutions to 
be a discrete set; using terminology introduced in a previous chapter, our boundary-value 
problem can be said to have a discrete spectrum. | 
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Circular Cylindrical Coordinates 


Curvilinear coordinate systems introduce additional nuances into the process for separating 
variables. Again we consider the Helmholtz equation, now in circular cylindrical coordi- 
nates. With our unknown function w dependent on p, gy, and z, that equation becomes, 
using Eq. (3.149) for V7: 








V'W(p.9,) + W(p, 9,2) =0, (9.53) 
or 
2 2 
-7 (0 ~) ~ ~ + 3 +key =0. (9.54) 
As before, we assume a factored form? for y, 
(Pp, 9, Zz) = P(p)P(p)Z(z). (9.55) 
Substituting into Eq. (9.46), we have 
®Zd (dP\ PZd*® 2. 3s 
arr (0 =) oie + P® de +k°P®Z=0. (9.56) 


All the partial derivatives have become ordinary derivatives. Dividing by P®Z and moy- 
ing the z derivative to the right-hand side yields 


1 d (dP 1 d*® 1 d?Z 
p ns (9.57) 
pP do do 


+ : 
p2® dy? Z dz? 
Again, a function of z on the right appears to depend on a function of p and ¢ on the 
left. We resolve this by setting each side of Eq. (9.57) equal to the same constant. Let us 
choose* —/?. Then 








wz 
i rz (9.58) 
and 
1 df aP 1 d® 
p Pap, (9.59) 
pPdp\ dp}  p?® dg? 
Setting 
PEP an’. (9.60) 
multiplying by p*, and rearranging terms, we obtain 
pd ( dP ya 1d’® 
2 p—— Sa 9.61 
oa (os) +0 & dg (9.61) 


3For those with limited familiarity with the Greek alphabet, we point out that the symbol P is the upper-case form of p. 

4 Again, the choice of sign of the separation constant is arbitrary. However, the minus sign chosen for the axial coordinate z is 
optimum if we expect exponential dependence on z, from Eq. (9.58). A positive sign is chosen for the azimuthal coordinate g in 
expectation of a periodic dependence on ¢, from Eq. (9.62). 
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We set the right-hand side equal to m7, so 
i 
dg? 


and the left-hand side of Eq. (9.61) rearranges into a separate equation for p: 


—m’®, (9.62) 


d dP 2.2 7) 
Pap Pp + (n° p* —m*)P =0. (9.63) 


Typically, Eq. (9.62) will be subject to the boundary condition that ® have periodicity 27 
and will therefore have solutions 





+"? or, equivalently sinmy, cosmg, with integer m. 


fan} 


The p equation, Eq. (9.63), is Bessel’s differential equation (in the independent variable 
np), originally encountered in Chapter 7. Because of its occurrence here (and in many 
other places relevant to physics), it warrants extensive study and is the topic of Chapter 14. 
The separation of variables of Laplace’s equation in parabolic coordinates also gives rise 
to Bessel’s equation. It may be noted that the Bessel equation is notorious for the variety 
of disguises it may assume. For an extensive tabulation of possible forms the reader is 
referred to Tables of Functions by Jahnke and Emde.” 

Summarizing, we have found that the original Helmholtz equation, a 3-D PDE, can be 
replaced by three ODEs, Eqs. (9.58), (9.62), and (9.63). Noting that the ODE for p contains 
the separation constants from the z and gy equations, the solutions we have obtained for the 
Helmholtz equation can be written, with labels, as 


Wim (0, G2) = Pim(P) Pm (GY) Z7(Z), (9.64) 


where we probably should recall that the n in Eq. (9.63) for P is a function of | (specif- 
ically, n> = 1? + k*). The most general solution of the Helmholtz equation can now be 
constructed as a linear combination of the product solutions: 


Y(p, 9,2) = Yim Pim(p) Pm (Y)Zi (2). (9.65) 
l,m 


Reviewing what we have done, we note that the separation could still have been achieved 
if k* had been replaced by any additive function of the form 


— toy += +n. 
p 


Example 9.4.3 CYLINDRICAL EIGENVALUE PROBLEM 


In this example we regard Eq. (9.53) as an eigenvalue problem, with Dirichlet boundary 
conditions yy = 0 on all boundaries of a finite cylinder, with k* initially unknown and to be 
determined. Our region of interest will be a cylinder with curved boundaries at p = R and 
with end caps at z= +L/2, as shown in Fig. 9.2. To emphasize that k? is an eigenvalue, 





5E. Jahnke and F. Emde, Tables of Functions, 4th rev. ed., New York: Dover (1945), p. 146; also, E. Jahnke, F. Emde, and 
F. Lésch, Tables of Higher Functions, 6th ed., New York: McGraw-Hill (1960). 
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FiGURE 9.2 Cylindrical region for solution of the Helmholtz equation. 


we rename it A, and our eigenvalue equation is, symbolically, 
—Vw=ay, (9.66) 


with boundary conditions w = 0 at po = R and at z= +L/2. Apart from constants, this is 
the time-independent Schrédinger equation for a particle in a cylindrical cavity. We limit 
the present example to the determination of the smallest eigenvalue (the ground state). 
This will be the solution to the PDE with the smallest number of oscillations, so we seek a 
solution without zeros (nodes) in the interior of the cylindrical region. 

Again, we seek separated solutions of the form given in Eq. (9.55). The ODEs for Z and 
®, Eqs. (9.58) and (9.62), have the simple forms 


Z"=Z, 0” =-m’o 
with general solutions 
Z=Ae®+Be", O=A'sinmy+ B'cosmg. 


We now need to specialize these solutions to satisfy the boundary conditions. The condition 
on ® is simply that it be periodic in g with period 27; this result will be obtained if m is 
any integer (including m = 0, which corresponds to the simple solution ® = constant). 
Since our objective here is to obtain the least oscillatory solution, we choose that form, 
® = constant, for ®. 

Looking next at Z, we note that the arbitrary choice of sign for the separation constant 
I? has led to a form of solution that appears not to be optimum for fulfilling conditions 
requiring Z = 0 at the boundaries. But, writing /7 = —w?, 1 = iw, Z becomes a linear 
combination of sinwz and cos wz; the least oscillatory solution with Z(+L/2) =0 is Z = 
cos(wz/L), so w =1/L, and 1? = —n?/L?. 

The functions Z(z) and ®(qg) that we have found satisfy the boundary conditions in z 
and g but it remains to choose P(p) ina way that produces P = 0 at p = R with the least 
oscillation in P. The equation governing P, Eq. (9.63), is 








p°P" + pP’ +n*p*P =0, (9.67) 
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where n was introduced as satisfying (in the current notation) n? = A + 1”, see Eq. (9.60). 
Continuing now with Eq. (9.67), we identify as the Bessel equation of order zero in x = np. 
As we learned in Chapter 7, this ODE has two linearly independent solutions, of which 
only the one designated Jo is nonsingular at the origin. Since we need here a solution that 
is regular over the entire range 0 < x <nR, the solution we must choose is Jo(np). 

We can now see what is necessary to satisfy the boundary condition at op = R, namely 
that Jo(nR) vanish. This is a condition on the parameter n. Remembering that we want 
the least oscillatory function P, we need for n to be such that nR will be the location of 
the smallest zero of Jo. Giving this point the name aw (which by numerical methods can 
be found to be approximately 2.4048), our boundary condition takes the form nR = aq, or 
n= a/R, and our complete solution to the Helmholtz equation can be written 


W(p, 9.2) = Jo () cos (=) (9.68) 
D2= — )cos(—). : 
P»P% 0 R L 

To complete our analysis, we must figure out how to arrange that n = a/R. Since the 
condition connecting n, /, and A rearranges to 


Lew =! (9.69) 


we see that the condition on n translates into one on A. Our PDE has a unique ground- 
state solution consistent with the boundary conditions, namely an eigenfunction whose 
eigenvalue can be computed from Eq. (9.69), yielding 


a. 


ee ap 

If we had not restricted consideration to the ground state (by choosing the least 
oscillatory solution), we would have (in principle) been able to obtain a complete set of 
eigenfunctions, each with its own eigenvalue. | 


Spherical Polar Coordinates 


As a final exercise in the separation of variables in PDEs, let us try to separate 
the Helmholtz equation, again with k? constant, in spherical polar coordinates. Using 
Eq. (3.158), our PDE is 


1 0 (dv a/. aw 1 aw 4 
6 =—k’y. . 
r2 sind sino (: or ) > (sin 30 ) “aa oe ¥ en) 











Now, in analogy with Eq. (9.30) we try 
wir, 6,~) = R(r)O(O)®(). (9.71) 


By substituting back into Eq. (9.70) and dividing by RO®, we have 


1 d (dR 1 d/(/., ,do 1 do 4 
9 =. (9.72 
Rr? dr (« dr ) Gy ain de (si do ) Ge nee dee wl) 
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Note that all derivatives are now ordinary derivatives rather than partials. By multiplying 
by r? sin” 6, we can isolate (1/)(d2®/dg?) to obtain 


1d’ 1 R 1 
sa? =r’ sin? 6 | —k? d ee - au wee . (9.73) 
® dy? Rr? dr dr Or? sind dé do 


Equation (9.73) relates a function of g alone to a function of r and @ alone. Since r, 6, 
and ¢ are independent variables, we equate each side of Eq. (9.73) to a constant. In almost 
all physical problems, ¢ will appear as an azimuth angle. This suggests a periodic solution 
rather than an exponential. With this in mind, let us use —m? as the separation constant, 
which then must be an integer squared. Then 

















1d’® 
per) 4p (9.74) 
® dg? 
and 
1 d (dR 1 d/(., do m> i 
0 = —k’, 9.75 
Rr? dr (« =) Or? ane db (si in) r2 sin? 6 vat 
Multiplying Eq. (9.75) by r? and rearranging terms, we obtain 
1d (5dR 5 1 d(, do m 
—— — k= 6 7 9.76 
R dr (: a) G@anege. ao)” Gate et) 
Again, the variables are separated. We equate each side to a constant, A, and finally obtain 
1 d dO m 
ind ©+A0=0, 9.77 
sind do (si do ) ae ae 
1 d[/5dR\. 1.4, AR 
— — k*R-— =0. 9.78 
r2 dr (" dr ) a r2 ( ) 


Once more we have replaced a partial differential equation of three variables by three 
ODEs. 

The ODE for ® is the same as that encountered in cylindrical coordinates, with solutions 
exp(timg) or sinmg, cosmg. The © ODE can be made less forbidding by changing the 
independent variable from @ to t = cos@, after which Eq. (9.77), with ©(@) now written 
as P(cos@) = P(t), becomes 


2 
(1 —12)P"(t) — 24P'(t) — PO + APO) =0. (9.79) 


This is the associated Legendre equation (called the Legendre equation if m = 0), and is 
discussed in detail in Chapter 15. We normally require solutions for P(t) that do not have 
singularities in the region within the range of the spherical polar coordinate 0 (namely 
that it be nonsingular for the entire range 0 < 6 < z, equivalent to —1 < tf < +1). The 
solutions satisfying these conditions, called associated Legendre functions, are tradition- 
ally denoted P;", with / a nonnegative integer. In Section 8.3 we discussed the Legendre 
equation as a 1-D eigenvalue problem, finding that the requirement of nonsingularity 
at t = +1 is a sufficient boundary condition to make its solutions well defined. We 
found also that its eigenfunctions are the Legendre polynomials and that its eigenvalues 
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(A in the present notation) have the values /(/+ 1), where / is an integer. The generalization 
of these findings to the associated Legendre equation (that with nonzero m) shows that 
continues to be given as /(J + 1), but with the additional restriction that / > |m|. Details are 
deferred to Chapter 15. 

Before continuing to the R equation, Eq. (9.78), let us observe that in deriving the ® and 
© equations we have assumed that k? was a constant. However, if k? was not a constant, 
but an additive expression of the form 


2 g(@) h(@) 
= eae OPT 





we could still carry out the separation of variables, but the relatively familiar ® and © 
equations we have identified will be changed in ways that make them different, and prob- 
ably less tractable. However, if the departure of k* from a constant value is restricted to 
the form k? = k?(r), then the angular parts of the separation will remain as presented 
in Eqs. (9.74) and (9.79), and we only need to deal with increased generality in the R 
equation. 

It is worth stressing that the great importance of this separation of variables in spherical 
polar coordinates stems from the fact that the case k* = k*(r) covers a tremendous amount 
of physics, such as a great deal of the theories of gravitation, electrostatics and atomic, 
nuclear, and particle physics. Problems with k* = k?(r) can be characterized as central 
force problems, and the use of spherical polar coordinates is natural in such problems. 
From both a practical and a theoretical point of view, it is a key observation that the angu- 
lar dependence is isolated in Eqs. (9.74) and (9.77), or its equivalent, Eq. (9.79), that these 
equations are the same for all central force problems, and that they can be solved exactly. 
A detailed discussion of the angular properties of central force problems in quantum me- 
chanics is deferred to Chapter 16. 

Returning now to the remaining separated ODE, namely the R equation, we consider in 
some depth two special cases: (1) The case k* = 0, corresponding to the Laplace equation, 
and (2) k* a nonzero constant, corresponding to the Helmholtz equation. For both cases we 
assume that the ® and © equations have been solved subject to the boundary conditions 
already discussed, so that the separation constant A must have the value /(/ + 1) for some 
nonnegative integer /. Continuing on the assumption that k? is a (possibly zero) constant, 
Eq. (9.79) expands into 


r?R" 4 2r R' + [er —1d+ »] R=0. (9.80) 


Taking first the case of the Laplace equation, for which k?=0, Eq. (9.80) is easy to 
solve. Either by inspection or by attempting to carry out a series solution by the method 
of Frobenius, it is found that the initial term of the series, agr*, is by itself a complete 
solution to Eq. (9.80). In fact, substituting the assumed solution R = r* into Eq. (9.80), 
that equation reduces to 


s(s — 1)r° +2sr° —10 + 1)r* =0, 


showing that s(s + 1) =/(1 + 1), which has two solutions, s = / (obviously), and s = 
—l — 1. In other words, given the value / from the choice of solution to the © equation, 
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we find that the R equation (for the Laplace equation) has the two solutions r! and r~/!, 
so its general solution takes the form 


R(r)=Ar'+ Br, (9.81) 


Combining the solutions to the separated ODEs, and summing over all choices of the 
separation constants, we see that the most general solution of the Laplace equation that has 
a nonsingular angular dependence can be written 


wr, 8,9) = > (Amr! + Bimr'') Pi?" (cos0)(A},, sinmy + Bi, cosmg). (9.82) 


I,m 


If our problem now has Dirichlet or Neumann boundary conditions on a spherical surface 
(with the region under study either within or outside the sphere), we may be able (by meth- 
ods more fully articulated in later chapters) to choose the coefficients in Eq. (9.82) so that 
the boundary conditions are satisfied. Note that if the region in which we are to solve the 
Laplace equation includes the origin, r = 0, then only the r! term should be retained and 
we set Bj, to zero. If our region for the Laplace equation is, say, external to a sphere of 
some finite radius, then we must avoid the large-r divergence of r! and set Ajm to zero, 
retaining only r~‘—!. More complicated cases, e.g., where we study the annular region 
between two concentric spheres, will require the retention of both Aj, and Bj, and will in 
general be somewhat more difficult. 

We continue now to the case of nonzero but constant k?. Equation (9.80) looks a lot 
like a Bessel equation, but differs therefrom by the coefficient “2” in the R’ term and the 
factor k* that multiplies r? in the coefficient of R. Both these differences can be resolved 
by rewriting R(r) as 


(9.83) 


which will then give us a differential equation for Z. Carrying out the differentiations to 
obtain R’ and R” in terms of Z, and changing the independent variable from r to x = kr, 
Eq. (9.83) becomes 


2" +x2Z' +[x°- (145) ]Z=0, (9.84) 


showing that Z is a Bessel function, of order / + 5. Returning to Eq. (9.83), we can now 
identify R(r) in terms of quantities known as spherical Bessel functions, where j;(x), the 
spherical Bessel functions that are regular at x = 0, have definition 


a 
Ji(x) = Fe 1+1/2(x). 


Since the status of R(r) as the solution to a homogeneous ODE is not affected by the scale 
factor in the definition of jj(x), we see that Eq. (9.83) is equivalent to the observation that 
Eq. (9.80) has a solution j;(kr). The spherical Bessel function that is the second solution of 
Eq. (9.80) is designated y;, so that solution is y)(kr), and the general solution of Eq. (9.80) 
can be written 


R(r) = Aji(kr) + Byi (kr). (9.85) 





428 


Chapter 9 Partial Differential Equations 


We note here that the properties of spherical Bessel functions are discussed more fully in 
Chapter 14. 

With the solutions to the radial ODE in hand, we can now write that the general solution 
to the Helmholtz equation in spherical polar coordinates takes the form 


wv(r,0,9) = > [Aim jkr) + Bimyi(kr)] x Pi" (cos0)(Aj, sinmgy + Bj, cosmg). 
l,m 


(9.86) 


The above discussion assumes that k? > 0; negative values of kK (and therefore 
imaginary values of k) simply correspond to our identifying an equation of the form 
(V7 — k*)W =0 as a somewhat peculiar case of (V7 +k”) = 0. For negative k?, we 
can see we then get solutions that involve j;(kr) or y;(kr) with imaginary k. In order 
to avoid notations that unnecessarily involve imaginary quantities, it is usual to define a 
new set of functions ij(x) that are proportional to j;(ix), and are called modified spher- 
ical Bessel functions. The modified solutions parallel to y;(ix) are denoted k;(x). These 
functions are also discussed in Chapter 14. 

The cases we have just surveyed do not, of course, cover all possibilities, and various 
other choices of k?(r) lead to problems that are of importance in physics. Without pro- 
ceeding to a detailed analysis here, we cite a couple: 


e Taking kz = A/r +A yields (with boundary condition that w vanish in the limit 
r — oo) the time-independent Schrédinger equation for the hydrogen atom; the R 
equation can then be identified as the associated Laguerre differential equation, dis- 
cussed in Chapter 18. 


e Taking k? = Ar? + A yields (with boundary condition at r = 00) the equation for the 
3-D quantum harmonic oscillator, for which the R equation can be reduced to the 
Hermite ODE, also discussed in Chapter 18. 


Some other boundary-value problems lead to well-studied ODEs. However, sometimes the 
practicing physicist will encounter a radial equation that may have to be solved using the 
techniques presented in Chapter 7, or if all else fails, by numerical methods. 

We close this subsection with an example that is a simple boundary-value problem in 
spherical coordinates. 


Example 9.4.4 | SPHERE WITH BOUNDARY CONDITION 


In this example we solve the Laplace equation for the electrostatic potential yw(r) in a 
region interior to a sphere of radius a, using spherical polar coordinates (7,6, gy) with 
origin at the center of the sphere. Our solution is to be subject to the Neumann boundary 
condition dyy/dn = —Vocos6@ on the spherical surface. See Fig. 9.3. 

To start, we note that totally arbitrary Neumann boundary conditions will not be consis- 
tent with our assumption of a charge-free sphere, as the integral of the normal derivative on 
the spherical surface gives, according to Gauss’ law, a measure of the total charge within. 
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FiGuRE 9.3 Arrows indicate sign and relative magnitude of the (inward) normal 
derivative of the electrostatic potential on a spherical surface (boundary condition 
for Example 9.4.4). 


The present example is internally consistent, as 


U 20 
pote = | 40 [ aecose =o. 
0 0 


Next, we need to take the general solution for the Laplace equation within a sphere, as 
given by Eq. (9.82), and calculate therefrom the inward normal derivative at r = a. Since 
the normal is in the —r direction, we need only compute —dy/dr, evaluated at r =a. 
Noting that for the present problem B;,, = 0, our boundary condition becomes 


—Vcosé@ = -) 1 Atma’ 1 pm (cos 0)(Ai,,, sinmg + B;,, cosm@). 


l,m 


Since the left-hand side of this equation is independent of ¢, its right-hand side has nonzero 
coefficients only for m = 0, for which we only have the term originally containing Bio, 
because sin(0) = 0. Thus, consolidating the constants, the boundary condition becomes 
the simpler form 


— Vcos@ =—)°1 Aja'~! P;(cos#), (9.87) 
i 


Without having made a detailed study of the properties of Legendre functions, the solution 
of an equation of this type might need to be deferred to Chapter 15, but this one is easy 
to solve because P| (cos@) = cos@ (see Legendre polynomials in Table 15.1) Thus, from 
Eq. (9.87), 


LAya’—! = Vb, 


so A; = V and all the other coefficients except Ag vanish. The coefficient Ao is not deter- 
mined by the boundary conditions and represents an arbitrary constant that may be added 
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to the potential. Thus, the potential within the sphere has the form 
w = Vr P,(cos0) + Ap = Vrcosé + Ag = Vz+ Ao, 


corresponding to a uniform electric field within the sphere, in the —z direction and of 
magnitude V. The electric field is, of course, unaffected by the arbitrary value of the 
constant Ao. | 


Summary: Separated-Variable Solutions 


For convenient reference, the forms of the solutions of Laplace’s and Helmholtz’s equa- 
tions for spherical polar coordinates are collected in Table 9.2. Although the ODEs 
obtained from the separation of variables are the same irrespective of the boundary con- 
ditions, the ODE solutions to be used, and the constants of separation, do depend on the 
boundaries. Boundaries with less than spherical symmetry may lead to values of m and 
I that are not integral, and may also require use of the second solution of the Legendre 
equation (quantities normally denoted Q/”). Engineering applications frequently require 
solutions to PDEs for regions of low symmetry, but such problems are nowadays almost 
universally approached using numerical, rather than analytical methods. Consequently, 
Table 9.2 only contains data that are relevant for problems inside or outside a spherical 
boundary, or between two concentric spherical boundaries. This restriction to spherical 
symmetry causes the angular portion of the solutions to be uniquely of the form we have 
already identified. 

In contrast to the unique angular solution, both linearly independent solutions to the 
radial ODE are relevant, with the choice of solution dependent on the geometry. Solutions 
within a sphere must employ only the radial functions that are regular at the origin, i.e., 
r!, jt, or iy. Solutions external to a sphere may employ r~!~!, k; (defined so that it will 
decay exponentially to zero at large r), or a linear combination of jj and y (both of which 
are oscillatory and decay as r~!/*), Solutions between concentric spheres can use both the 
radial functions appropriate to the PDE. 

It is also possible to summarize the forms of solution to the Laplace and Helmholtz 
equations in circular cylindrical coordinates, if we restrict attention to problems that have 
circular symmetry about the axial direction of the coordinate system. However, the situa- 
tion is considerably more complicated than for spherical coordinates, as we now have two 


Table 9.2 Solutions of PDEs in Spherical Polar Co- 








ordinates” 
Am Cosm@ + by, sinme) 
w= >> AC) P}" (cos) or 
I,m Cime"? 
V2 =0 firyar', rt 
Vow +key =0 fir) = jkr), yy(kr) 
Vey —key =0 fi) =i (kr), ky (kr) 





* For iy, jt, ki, yj, see Chapter 14; for P/", see Chapters 15 and 16. 
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Table 9.3. Solutions of PDEs in Circular Cylindrical Coordinates” 





Ana cosmy + bng sinm@) 





v= D0 fma(P)ga(2) or 
m,a Cmae"? 
Vy =0 fma(p) = Imp), ¥m(ap) Baz) =e, e~% 
or Sma P) = In (ap), Km (ap) 8a (Z) = sin(wz), cos(az) or e'%* 
or fma(p)=p™, p—™ 8a(z) = 1 
Vw try =0 fima(p) = Jn (p), Ym (arp) 
if 82? =a2—A>0, Ga (z) = eF, ez 
if B2 =r’ —a2>0, 2a(z) = sin(Bz), cos(Bz) or e!P2 
if. =o, 8a(z) = 1 
or Sma (P) = In (ap), Km (ap) 
if B? =A —a2 +0, 8a(z) = eF%, e Pe 
if B27 =r’ +a72>0, 2a(z) = sin(Bz), cos(Bz) or e!P2 
ifA=—a2, ga(z)=1 
or fina(p) = p™, p~”™ 
if B? =-A>0, 8a (z) = eB2, Bz 
if B27 =Aa>0, 8q(z) = sin(Bz), cos(Bz) or e!Pé 








@ The parameter a can have any real values consistent with the boundary conditions. For J, 
Jm; Km, Ym, see Chapter 14. 


coordinates (o and z) that can have a variety of boundary conditions, in contrast to the 
single such coordinate (r) in the spherical system. In spherical coordinates the form of the 
radial function is completely determined by the PDE, and specific problems differ only 
in the choice (or relative weight) of the two linearly independent radial solutions. But in 
cylindrical coordinates the forms of the o and z solutions, as well as their coefficients, are 
determined by the boundary conditions, and not entirely by the value of the constant in the 
Helmholtz equation. Choices of the p and z solutions, though coupled, can vary widely. 
For details, the reader is referred to Table 9.3. 

Our final observations of this section deal with the functions we encountered in the 
course of the separations in cylindrical and spherical coordinates. For the purpose of this 
discussion, it is useful to think of our PDE as an operator equation subject to boundary 
conditions. If, in cylindrical coordinates, we restrict attention to PDEs in which the param- 
eter k? is independent of y (and with boundary conditions that do not depend upon g), we 
have chosen our operator equation as one that has circular symmetry. Moreover, we will 
then always get the same ® equation, with (of course) the same solutions. In these cir- 
cumstances, the solutions will have symmetry properties derived from those of our overall 
boundary-value problem.° The ® equation can also be thought of as an operator equa- 
tion, and we can go further and identify the operator as if = —d*/dy7, where L; is the 
z component of the angular momentum. The solutions of the ® equation are eigenfunc- 
tions of this operator; the reason they can occur as part of the PDE solution is because 


Note that the solutions to a boundary-value problem need not have the full problem symmetry (a point that will be elaborated 
in great detail when we develop group-theoretical methods). An obvious example is that the Sun-Earth gravitational potential is 
spherically symmetric, while the most familiar solution (the Earth’s orbit) is planar. The dilemma is resolved by noting that the 
spherical symmetry manifests itself in the possible existence of Earth orbits at all angular orientations. 
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Bes commutes with the operator defining the PDE (clearly so, because the PDE operator 
does not contain gy). In other words, because Le and the PDE operator commute, they will 
have simultaneous eigenfunctions, and the overall solutions of the PDE can be labeled to 
identify the L? eigenfunction that was chosen. 

Looking now at the situation in spherical polar coordinates, we note that if k? is inde- 
pendent of the angles, i.e., k* = k*(r), then our PDE always has the same angular solutions 
Oim (9) ®m (eg). Looking further at the angular terms of our PDE, we can identify them as 
the operator L, and we see that the angular solutions we have found are eigenfunctions of 
this operator. When the PDE operator is independent of the angles, it will commute with 
L? and the solutions to the PDE can be labeled accordingly. These symmetry features are 
very important and are discussed in great detail in Chapter 16. 


Exercises 


9.4.1 


9.4.2 


9.4.3 


9.4.4 


9.4.5 


By letting the operator V7 + k? act on the general form aj W(x, y, z) + anWa(x, y, z), 
show that it is linear, i.e., that (V7 + k2)(ajW + an2) = ay (V2 +2) + an(V? + 
kW. 


Show that the Helmholtz equation, 

Vy +key =0, 
is still separable in circular cylindrical coordinates if k* is generalized to k* + f(p) + 
(1/p*)g(y) +h). 


Separate variables in the Helmholtz equation in spherical polar coordinates, splitting off 
the radial dependence first. Show that your separated equations have the same form as 
Eqs. (9.74), (9.77), and (9.78). 


Verify that 
1 1 
V>W(r, 0,9) + le + f(r) +58) + =53,h| w(r,0,y) =0 
r r2 sin 0 
is separable (in spherical polar coordinates). The functions f, g, and # are functions 
only of the variables indicated; k? is a constant. 


An atomic (quantum mechanical) particle is confined inside a rectangular box of sides 
a, b, and c. The particle is described by a wave function y that satisfies the Schrédinger 
wave equation 


hi 
—-—_W y= Ey. 
2m 


The wave function is required to vanish at each surface of the box (but not to be identi- 
cally zero). This condition imposes constraints on the separation constants and therefore 
on the energy E. What is the smallest value of E for which such a solution can be 


obtained? 
ids pen 1 iid 
‘ ~ Im \ar be 2) 
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9.4.6 The quantum mechanical angular momentum operator is given by L = 
—i(r x V). Show that 
L-Ly=ld+ Dw 


leads to the associated Legendre equation. 
Hint. Section 8.3 and Exercise 8.3.1 may be helpful. 
9.4.7 The 1-D Schrédinger wave equation for a particle in a potential field V = 5kx? is 
needy 1 
~~ 4 -kx*p=E : 
mae oo 


(a) Defining 


_ (mk\\" ee. 
a= A2 ’ —_— h k ’ 


and setting € = ax, show that 


dé) 
dé? 





+ (A —&)w(é) =0. 
(b) Substituting 


w)=yE)e*”, 
show that y(&) satisfies the Hermite differential equation. 


9.5 LAPLACE AND POISSON EQUATIONS 


The Laplace equation can be considered the prototypical elliptic PDE. At this point we 
supplement the discussion motivated by the method of separation of variables with some 
additional observations. The importance of Laplace’s equation for electrostatics has stim- 
ulated the development of a great variety of methods for its solution in the presence of 
boundary conditions ranging from simple and symmetrical to complicated and convoluted. 
Techniques for present-day engineering problems tend to rely heavily on computational 
methods. The thrust of this section, however, will be on general properties of the Laplace 
equation and its solutions. 

The basic properties of the Laplace equation are independent of the coordinate system 
in which it is expressed; we assume for the moment that we will use Cartesian coordinates. 
Then, because the PDE sets the sum of the second derivatives, 37 y/ ax?, to zero, it is 
obvious that if any of the second derivatives has a positive sign, at least one of the others 
must be negative. This point is illustrated in Example 9.4.1, where the x and y dependence 
of a solution to the Laplace equation was sinusoidal, and as a result, the z dependence was 
exponential (corresponding to different signs for the second derivative). Since the second 
derivative is a measure of curvature, we conclude that if w has positive curvature in any 
coordinate direction, it must have negative curvature in some other coordinate direction. 
That observation, in turn, means that all the stationary points of yy (points where its 
first derivatives in all directions vanish) must be saddle points, not maxima or minima. 
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Since the Laplace equation describes the static electric potential in charge-free regions, we 
conclude that the potential cannot have an extremum at a point where there is no charge. 
A corollary to this observation is that the extrema of the electrostatic potential in a charge- 
free region must be on the boundary of the region. 

A related property of the Laplace equation is that its solution, subject to Dirichlet bound- 
ary conditions for the entire closed boundary of its region, is unique. This property applies 
also to its inhomogeneous generalization, the Poisson equation. The proof is simple: Sup- 
pose there are two distinct solutions yy, and wW2 for the same boundary conditions. Then, 
their difference w = yw — W2 (for either the Laplace or Poisson equation) will be a solution 
to the Laplace equation with y = 0 on the boundary. Since w cannot have extrema within 
the bounded region, it must be zero everywhere, meaning that Wy = Wo. 

If we have a Laplace or Poisson equation subject to Neumann boundary conditions on 
the entire closed boundary of its region, then the difference y = Ww — Wz of two solutions 
will also be a solution to the Laplace equation with a zero Neumann boundary condition. 
To analyze this situation, we invoke Green’s Theorem, in the form provided by Eq. (3.86), 
taking both u and v of that equation to be y. Equation (3.86) then becomes 


OV : 
[vxas= fv war [ Ve-Vode. (9.88) 
S Vv V 


The boundary condition causes the left-hand side of Eq. (9.88) to vanish, the first integral 
on the right-hand side vanishes because w is a solution of the Laplace equation, and the 
remaining integral on the right-hand side must therefore also vanish. But that integral can 
only vanish if Vw is zero everywhere, which can only be true if y is constant. Thus, 
solutions to the Laplace equation with Neumann boundary conditions are also unique, 
except for an additive constant to the potential. 

An oft-cited application of this uniqueness theorem is the solution of electrostatics prob- 
lems by the method of images, which replaces a problem containing boundaries by one 
without a boundary but with additional charge added in such a way that the potential at 
the boundary location has the desired value. For example, a positive charge in front of a 
grounded boundary (one with yw = 0) can be augmented by a negative charge at the mirror- 
image position behind the boundary. Then the two-charge system (ignoring the boundary) 
will yield the desired zero potential at the boundary location, and the uniqueness theorem 
tells us that the potential calculated for the two-charge system must be the same (within 
the original region) as that for the original system. 


Exercises 


9.5.1 


9.5.2 
9.5.3 


Verify that the following are solutions of Laplace’s equation: 


1 
@) Wel/nr 40, 0) =p 


z 





If W is a solution of Laplace’s equation, V7 = 0, show that 0W/dz is also a solution. 


Show that an argument based on Eq. (9.88) can be used to prove that the Laplace and 
Poisson equations with Dirichlet boundary conditions have unique solutions. 
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9.6 WAVE EQUATION 


The wave equation is the prototype hyperbolic PDE. As we have seen earlier in this chap- 
ter, hyperbolic PDEs have two characteristics, and for the equation 
Lay a°y 
Zoe tae 
the characteristics are lines of constant x — ct and those of constant x + ct. This means 
that the general solution to Eq. (9.89) takes the form 


w(x,t)= f(x—ct)+g(x+ct), (9.90) 


with f and g completely arbitrary. 

Viewing x as a position variable and ¢ as the time, we can interpret f(x — cf) as a wave, 
moving with velocity c, in the +x direction. By this we mean that the entire profile of f, 
as a function of x at t = 0, will be shifted uniformly toward positive x by an amount c 
when f = 1. See Fig. 9.4. Similarly, g(x + ct) describes a wave moving at velocity c in the 
—x direction. Because f and g are arbitrary, the traveling waves they describe need not 
be sinusoidal or periodic, but may be entirely irregular; moreover, there is no requirement 
that f and g have any particular relationship to each other. 

An obvious special case of the general situation described above is that when f (x — ct) 
is chosen to be sinusoidal, f = sin(x — ct). For simplicity we have taken f to have unit 
amplitude and wavelength 27. We also take g(x + ct) to be g = sin(x + cf), a sinusoidal 
wave of the same wavelength and amplitude traveling in the direction opposite to f. Ata 
point x and time f, these two waves add to produce a resultant 


(9.89) 


w(x, t) = sin(x — ct) + sin(x + ct), 
which, using trigonometric identities, can be rearranged to 
w(x, t) = (sinx cos ct — cosx sinct) + (sinx cosct + cos x sinct) = 2 sinx cos ct. 


This form for w can be identified as a standing wave distribution, meaning that the time 
evolution of the wave’s profile in x is an oscillation in amplitude, with the wave pattern 
not moving in either direction. An obvious point of difference from a traveling wave is 
that for a standing wave, the nodes (points where y = 0) are stationary in time, while in a 
traveling wave they are moving in time at velocity +c. 

Our current interest in traveling vs. standing waves is their relation to solutions to 
the wave equation that we might find using the method of separation of variables. That 
method would obviously lead us to standing-wave solutions. However, it is useful to note 
that the totality of the solution set from the separated variables has the same content as 











FIGURE 9.4 Traveling wave f(x — ct). Dashed line is profile at t = 0; full line is 
profile at a time ¢ > 0. 
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the traveling-wave solutions. For example, the products sinx cosct and cosx sinct are 
solutions we would get by separating the variables, and linear combinations of these yield 
sin(x + cf). 





d’Alembert’s Solution 


While all ways of writing the general solution to the wave equation are mathemati- 
cally equivalent, diverse forms differ in their convenience of use for various purposes. 
To illustrate this, we consider how we might construct a solution to the wave equation, 
given, as an initial condition, (1) the entire spatial distribution of the wave amplitude at 
t = 0 and (2) the time derivative of the wave amplitude at t = 0 for the entire spatial distri- 
bution. The solution to this problem is generally referred to as d’Alembert’s solution of 
the wave equation; it was also (and slightly earlier) found by Euler. 

We start by using Eq. (9.90) to write our initial conditions in terms of the presently 
unknown functions f and g: 


W@,0) = Ff) + g(x), (9.91) 


Ow (x, ft) 
ot 


We now integrate Eq. (9.92) between the limits x — ct and x + ct (and divide the result 
by 2c), obtaining 


i= cf’ (x) + cg’ (x). (9.92) 











x-+ct 
| 1 
, i ue Da = sl f(x+ct)+ f(@ —ct) + g(x +ct) — g(x - ct). (9.93) 


x—ct 


From Eq. (9.91), we also have 


1 
sve + ct,0)+ w(x —ct,0)] = 


. f@tect)+g(x+ct) + fe —ct)+ g(x —ct)]. (9.94) 
2 


Adding together the right-hand sides of Eqs. (9.93) and (9.94), half the terms cancel, and 
those that survive combine to give the result 


f(x-—ct)+g@-+ct), whichis w(x,t). 
Therefore, from the left-hand sides of Eqs. (9.93) and (9.94), we obtain the final result 


X+ct 
1 1 av (x, 0) 
Wx, t) = =[v@ +ct,0) + v(x — ct, 0)] + — ———dx. (9.95) 
2 2c ot 
x—ct 

This equation gives w(x, ft) in terms of data at t = O that are within the distance ct of 
the point x. This is a reasonable result, since ct is the distance that waves in this problem 
can move between times ¢ = 0 and t =¢. More specifically, Eq. (9.95) contains terms that 
represent half the t = 0 amplitude at distances +ct from x (half, because a disturbance that 
starts at these points is split between propagation in both directions), plus an additional 
integral that accumulates the effect of the initial amplitude derivative over the region of 
influence. 
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Exercises 

Solve the wave equation, Eq. (9.89), subject to the indicated conditions. 

9.6.1 Determine w(x, ft) given that at t = 0 Wo(x) = sinx and dw(x)/dt =cosx. 

9.6.2 Determine (x, t) given that at t = 0 Wo(x) = 6(x) (Dirac delta function) and the initial 
time derivative of y is zero. 

9.6.3 Determine w(x, ft) given that at f = 0 wWo(x) is a single square-wave pulse as defined 
below, and the initial time derivative of y is zero. 

ox) =0, |x| >a/2, Wolx)=1/a, |x| <a/2. 
9.6.4 Determine w(x, ft) given that at t = 0 Wo = 0 for all x, but dW/dt = sin(x). 


9.7 HEAT-FLOW, OR DIFFUSION PDE 


Here we return to a parabolic PDE to develop methods that adapt a special solution of a 
PDE to boundary conditions by introducing parameters. The methods are fairly general 
and apply to other second-order PDEs with constant coefficients as well. To some extent, 
they are complementary to the earlier basic separation method for finding solutions in a 
systematic way. 

We consider the 3-D time-dependent diffusion PDE for an isotropic medium, using it 
to describe heat flow subject to given boundary conditions. Assuming isotropy actually is 
not much of a restriction because, in case we have different (constant) rates of diffusion in 
different directions, for example, in wood, our heat-flow PDE takes the form 

2 2 2 
le ge v +p is aoe is 





(9.96) 


ot ax2 ay? 0z2’ 
if we put the coordinate axes along the principal directions of anisotropy. Now we sim- 
ply rescale the coordinates using the substitutions x = aé, y = bn, z= cé to get back the 
original isotropic form of Eq. (9.96), 
dd 0°® x a® i 0° 
at 0&2 = an® ~~ ac?’ 
for the temperature distribution function ®(€,7,¢,f) = W(x, y, Z,f). 


For simplicity, we first solve the time-dependent PDE for a homogeneous one- 
dimensional medium, a long metal rod in the x-direction, for which the PDE is 


av _ 22 





(9.97) 


atx?” 
where the constant a measures the diffusivity, or heat conductivity, of the medium. We 
obtain solutions to this linear PDE with constant coefficients by the method of separation 
of variables, for which we set w(x, t) = X (x)T (1), leading to the separate equations 


1 dT 1d*X 8B 


(9.98) 


Tdt "? Xdx? @ 
These equations have, for any nonzero value of 8, solutions T = ef! and X = e**, with 
a* = B/a~. We seek solutions whose time dependence decays exponentially at large , 
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that is, solutions with negative values of B, and therefore set « = iw, a” = —w* for real w, 
and have 


u(x, t) = eX et — (cogeax tisinwx)e (9.99) 
Note that 6 = 0, for which 
W(x, t) = Cox + Co, (9.100) 


is also included in the solution set for the PDE. If we use this solution for a rod of infinite 
length, we must set Cj = 0 to avoid a nonphysical divergence; in any case, the value of Co 
is then the constant value that the temperature approaches at long times. 

Forming real linear combinations of sinwx and coswx with arbitrary coefficients, and 
keeping the 6 = 0 solution, we obtain from Eq. (9.99) for any choice of A, B, a, Co: and 
Co, a solution 


w(x, t) = (Acoswx + Bsin wxje eet + Cox +Co. (9.101) 


Solutions for different values of these parameters can now be combined as needed to form 
an overall solution consistent with the required boundary conditions. 

If the rod we are studying is finite in length, it may be that the boundary conditions can 
be satisfied if we restrict w to discrete nonzero values that are multiples of a basic value wo. 
For a rod of infinite length, it may be better to let w assume a continuous range of values, 
so that w(x, t) will have the general form 


u(x,t) = if [A(w) coswx + B(w) sinwxJe~* "dw + Co. (9.102) 
We call specific attention to the fact that 
e Forming linear combinations of solutions by summation or integration over parameters 


is a powerful and standard method for generalizing specific PDE solutions in order to 
adapt them to boundary conditions. 


Example 9.7.1 A SPECIFIC BOUNDARY CONDITION 


Let us solve a 1-D case explicitly, where the temperature at time t = 0 is Wo(x) = 1 = 
constant in the interval between x = +1 and x = —1 and zero for x > 1 and x < —1. At 
the ends, x = +1, the temperature is always held at zero. Note that this problem, including 
its initial conditions, has even parity, Wo(x) = Wo(—x), so W(x, t) must also be even. 

We choose the spatial solutions of Eq. (9.98) to be of the form given in Eq. (9.101), 
but restricted to Ci = Co = 0 (since the t — oo limit of (x,t) is zero for the entire 
range —1 <x <1), and to cos(/ax/2) for odd integer /, because these functions are the 
even-parity members of an orthonormal basis for the interval —1 <x < 1 that satisfy the 
boundary condition y = 0 at x = +1. Then, at t = 0 our solution takes the form 





Co 
I 
w(x, 0) =} arcos _ -l<x <1, 
l=1 


and we need to choose the coefficients a; so that w(x, 0) = 1. 
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Using the orthonormality, we compute 

1 

/ lx 2 . wlx 
aj = | 1-cos = sin 
2 

-1 

4 . Ia 4(-—1)” 

=—sin— = ' 

al 2 (2m + 1)x 


Including its time dependence, the full solution is given by the series 








l=2m+1. 








4 (=1)" OX) —4((2m+1)na/2) 
Wa.N=— >) Sey 908 [2m + N= Je (9.103) 
m=0 


which converges absolutely for tf > 0 but only conditionally at tf = 0, as a result of the 
discontinuity at x = +1. a 





We are now ready to consider the diffusion equation in three dimensions. We start by 
assuming a solution of the form y = f(x, y, z)T (ft), and separate the spatial from the time 
dependence. As in the 1-D case, T(t) will have exponentials as solutions, and we can 
choose the solution that decays exponentially at large t. Assigning the separation constant 
the value —k*, so that the time dependence is exp(—k?t), the separated equation in the 
spatial coordinates takes the form 

oF or OF 


-oF OF pio, 9.104 
dx2 ay? Az? f ( ) 





which we recognize as the Helmholtz equation. Assuming that we can solve this equation 
for various values of k? by further separations of variables or by other means, we can form 
whatever sum or integral of individual solutions that may be needed to satisfy the boundary 
conditions. 


Alternate Solutions 


In an alternative approach to the heat flow equation, we now return to the one-dimensional 
PDE, Eq. (9.98), seeking solutions of a new functional form w(x, t) = u(x//t), which 
is suggested by dimensional considerations and experimental data. Substituting u(é), € = 
x/4/t, into Eq. (9.98) using 
/ 2 ” 
ee a ay x Fy (9.108) 
Ox ft dx* ft at 2/3 


with the notation u’(&) = du/dé, the PDE is reduced to the ODE 
2a7ul (€) + Eu! (€) =0. (9.106) 





Writing this ODE as 
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we can integrate it once to get Inu’ = —E*/4a* +1nC), where C, is an integration constant. 
Exponentiating and integrating again we find the general solution 


E 
u(é) =C) / eo PMP ae 4 Cy, (9.107) 
0 


which contains two integration constants C;. We initialize this solution at time t = 0 to 
temperature +1 for x > 0 and —1 for x < 0, corresponding to u(oo) = +1 and u(—oo) = 
—1. Noting that 


Co 
fePeeag sayz, 
0 


a case of the integral evaluated in Eq. (1.148), we obtain 
u(oo) =aJ/nC, + C2 = 1, u(—oo) = —a./mC; + Co =—1, 
which fixes the constants C; = 1/a,/7, Cp =0. We therefore have the specific solution 


x/Jt x/2a/t 
1 2 1442 2 2 x 
7 8° /4a* ge — Vdv= ( 9.108 
¥ aJ/nt / ‘ 5 Whi / a a 2a/t )’ ( ) 
0 0 


where erf is the standard name for Gauss’ error function (one of the special functions 
listed in Table 1.2). We need to generalize this specific solution to adapt it to boundary 
conditions. 

To this end we now generate new solutions of the PDE with constant coefficients 
by differentiating the special solution given in Eq. (9.108). In other words, if w(x, f) 
solves the PDE in Eq. (9.98), so do dw/dt and dw/dx, because these derivatives and 
the differentiations of the PDE commute; that is, the order in which they are carried out 
does not matter. Note carefully that this method no longer works if any coefficient of the 
PDE depends on ¢ or x explicitly. However, PDEs with constant coefficients dominate 
in physics. Examples are Newton’s equations of motion in classical mechanics, the wave 
equations of electrodynamics, and Poisson’s and Laplace’s equations in electrostatics and 
gravity. Even Einstein’s nonlinear field equations of general relativity take on this special 
form in local geodesic coordinates. 

Therefore, by differentiating Eq. (9.108) with respect to x, we find the simpler, more 
basic solution, 





1 
wWix,t= we (9.109) 
and, repeating the process, another basic solution 
Xx 2 2 
\=o— =e 5 ll 
Wa(x,t) ala (9.110) 


Again, these solutions have to be generalized to adapt them to boundary conditions. And 
there is yet another method of generating new solutions of a PDE with constant coeffi- 
cients: We can translate a given solution, for example, (x,t) > W(x —a, t), and then 
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integrate over the translation parameter a. Therefore, 


foe) 
1 34,2 
v(x, t) = ——= / Claje OP da (9.111) 
2a/t 
a  S 
is again a solution, which we rewrite using the substitution 
= ae a@=x—2atVi, da=—2avVtdé. (9.112) 
a 
These substitutions lead to 
1 CO 
We.) = / C(x — 2aéVte® dé, (9.113) 
TT 
—0o 


a solution of our PDE. Equation (9.113) is in a form permitting us to understand the sig- 
nificance of the weight function C(x) from the translation method. If we set t = 0 in that 
equation, the function C in the integrand then becomes independent of &, and the integral 
can then be recognized as 


00 
| e Pde = Vr, 
—0o 
a well-known result equivalent to Eq. (1.148). Equation (9.113) then becomes the simpler 
form 


w(x,0)=Cx), or C(x) =o), 


where Wo is the initial spatial distribution of w. Using this notation, we can write the 
solution to our PDE as 


VaXH= =z / Wo(x — 2QaéVt)e* dé, (9.114) 


a form that explicitly displays the role of the boundary (initial) condition. From Eq. (9.114) 
we see that the initial temperature distribution, Wo(x), spreads out over time and is damped 
by the Gaussian weight function. 


Example 9.7.2 SPECIAL BOUNDARY CONDITION AGAIN 


We consider now a problem similar to Example 9.7.1, but instead of keeping w = 0 at 
all times at x = +1, we regard the system as infinite in length, with wo = 0 everywhere 
except for |x| < 1, where Wo = 1. This change makes Eq. (9.114) usable, because our PDE 
now applies over the range (—oo, 00), and heat will flow (and temporarily increase the 
temperature) at and beyond |x| = 1. 

The range of Wo(x) corresponds to a range of € with endpoints found from x — 2aé./f = 
+1, so our solution becomes 








(x+1)/2aVt 


1 2 
= =e 
W(x, th= Te / e° dé. 


(x-1)/2avi 
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In terms of the error function, we can also write this solution as 





w(x, t) ‘il (So) (<=) (9.115) 
x,t)=-—Jer er : : 

2 2Qar/t 2aJSt 
Equation (9.115) applies for all x, including |x| > 1. a 


Next we consider the problem of heat flow for an extended spherically symmetric 
medium centered at the origin, suggesting that we should use polar coordinates r, 0, y. We 
expect a solution of the form u(r, t). Using Eq. (3.158) for the Laplacian, we find the PDE 


a au 24a 
=a ( aa *), (9.116) 





ot dr? Or 
which we transform to the 1-D heat-flow PDE by the substitution 


v(r, t) du lav v du lav 
uz — = 








r ° Or ror r2’ Ot rot’ 
au = 1d*u 2 AV. 
= : 9.117 
ar2 or Or2——sr2 Ar v re ( ) 
This yields the PDE 
dv 0v 
—=a’°—,. 11 
ot a ar2 Ce 


Example 9.7.3 SPHERICALLY SYMMETRIC HEAT FLOW 


Let us apply the 1-D heat-flow PDE to a spherically symmetric heat flow under fairly 
common boundary conditions, where x is replaced by the radial variable. Initially we have 
zero temperature everywhere. Then, at time ¢ = 0, a finite amount of heat energy Q is 
released at the origin, spreading evenly in all directions. What is the resulting spatial and 
temporal temperature distribution? 

Inspecting our special solution in Eq. (9.110) we see that, for t + 0, the temperature 





vt) Co 4a% 


r VB 


goes to zero for all r 4 0, so zero initial temperature is guaranteed. As tf > oo, the temper- 
ature v/r — 0 for all r including the origin, which is implicit in our boundary conditions. 
The constant C can be determined from energy conservation, which gives (for arbitrary f) 
the constraint 


(9.119) 





[o.@) 
4a pC 
O=op | Ydt= 8 [Peri tar = VF opa%c. (9.120) 
r 
; 0 
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where p is the constant density of the medium and o is its specific heat. The final result 
in Eq. (9.120) is obtained by first making a change of variable from r to € = r/2aJ/‘, 
obtaining 


CO [o,@) 
/ eat gy = (2av/'t)* / oF ede, 
0 0 


then evaluating the € integral via an integration by parts: 


i ¢ (if vi 

=e 5 | -@ ge NE 
fe é*dé 7° eee dé - 
0 0 





The temperature, as given by Eq. (9.119) at any moment, i.e., at fixed ¢, is a Gaussian 
distribution that flattens out as time increases, because its width is proportional to /f. 
As a function of time the temperature at any fixed point is proportional to 1~3/*e77/" 
with T =r*/4a*. This functional form shows that the temperature rises from zero to a 
maximum and then falls off to zero again for large times. To find the maximum, we set 


d T 3 
f (3TH) = Se (- z >) xf (9.121) 
dt t 2 


from which we find tmax = 2T/3 = r? / 6a~. The temperature maximum arrives at later 
times at larger distances from the origin. a 


In the case of cylindrical symmetry (in the plane z = 0 in plane polar coordinates p = 
x? + y2,), we look for a temperature yy = u(p,t) that then satisfies the ODE (using 
Eq. (2.35) in the diffusion equation) 


a au 1a 
a7? (2a), (9.122) 
at dp2 sp Op 

which is the planar analog of Eq. (9.118). This ODE also has solutions with the functional 
dependence p/./t = r. Upon substituting 


p du pu duo! au ov 
—! a —| 7 => ’ = 9.123 

ar (5) ato” ap fe ape t Ce 
into Eq. (9.122) with the notation v’ = dv/dr, we find the ODE 








2 
ay" + (< : 5) v' =0. (9.124) 
r 2 
This is a first-order ODE for v’, which we can integrate when we separate the variables v 
and r as 
ey (Lanne (9.125) 
vo Nr 2a J 
This yields 


vir) = C =P se = CM! pat, 
r p 


(9.126) 
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This special solution for cylindrical symmetry can be similarly generalized and adapted to 
boundary conditions, as for the spherical case. Finally, the z-dependence can be factored 
in, because z separates from the plane polar radial variable p. 


Exercises 


9.7.1 


For a homogeneous spherical solid with constant thermal diffusivity, K, and no heat 
sources, the equation of heat conduction becomes 


oT (r,t 
ae KW°T(r,t). 
ot 
Assume a solution of the form 
T = R(r)T(t) 


and separate variables. Show that the radial equation may take on the standard form 


d?R dR 
2 Dae 
Og ae ge TOR 


and that sinar/r and cosar/r are its solutions. 


Separate variables in the thermal diffusion equation of Exercise 9.7.1 in circular cylin- 
drical coordinates. Assume that you can neglect end effects and take T = T(p, t). 


Solve the PDE 

dy 40° 

ay, nin 

or ax2 
to obtain w(x, ft) for a rod of infinite extent (in both the +x and — x directions), with a 
heat pulse at time t = 0 that corresponds to wo(x) = Ad(x). 


Solve the same PDE as in Exercise 9.7.3 for a rod of length L, with position on the rod 
given by the variable x, with the two ends of the rod at x = 0 and x = L kept (at all 
times ft) at the respective temperatures T = 1 and T = 0, and with the rod initially at 
T(x) =0, forO<x <L. 


9.8 SUMMARY 


This chapter has provided an overview of methods for the solution of first- and second- 
order linear PDEs, with emphasis on homogeneous second-order PDEs subject to bound- 
ary conditions that either determine unique solutions or define eigenvalue problems. We 
found that the usual boundary conditions are identified as of Dirichlet type (solution spec- 
ified on boundary), Neumann type (normal derivative of solution specified on boundary), 
or Cauchy type (both solution and its normal derivative specified). Applicable types of 
boundary conditions depend on the classification of the PDE; second-order PDEs are clas- 
sified as hyperbolic (e.g., wave equation), elliptic (e.g., Laplace equation), or parabolic 
(e.g., heat/diffusion equation). 
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The method of widest applicability to the solution of PDEs is the method of separation 
of variables, which, when effective, reduces a PDE to a set of ODEs. The chapter has pre- 
sented a very small number of complete PDE solutions to illustrate the technique. A wider 
variety of examples only becomes possible when we are prepared to exploit the proper- 
ties of the special functions that are the solutions of various ODEs, and, as a result, fuller 
illustration of PDE solutions will be provided in the chapters that discuss these special 
functions. We point out, in particular, that general PDEs with spherical symmetry all have 
the same angular solutions, known as spherical harmonics. These, and the functions from 
which they are constructed (Legendre polynomials and associated Legendre functions), 
are the subject matter of Chapters 15 and 16. Some spherically symmetric problems have 
radial solutions that can be identified as spherical Bessel functions; these are treated in 
the Bessel function chapter (Chapter 14). 

PDE problems with cylindrical symmetry usually involve Bessel functions, often in 
ways more complex than in the examples of the present chapter. Further illustrations appear 
in Chapter 14. 

This chapter has not attempted to discuss methods for the solution of inhomogeneous 
PDEs. That topic deserves its own chapter, and will be developed in Chapter 10. 

Finally, we repeat an earlier observation: Fourier expansions (Chapter 19) and integral 
transforms (Chapter 20) can also have a role in the solution of PDEs, and applications of 
these techniques to PDEs are included in the appropriate chapters of this book. 
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Guckenheimer, J., P. Holmes, and F. John, Nonlinear Oscillations, Dynamical Systems and Bifurcations of Vector 
Fields, revised ed. New York: Springer-Verlag (1990). 

Gustafson, K. E., Partial Differential Equations and Hilbert Space Methods, 2nd ed., New York: Wiley (1987), 
reprinting Dover (1998). 

Margenau, H., and G. M. Murphy, The Mathematics of Physics and Chemistry, 2nd ed. Princeton, NJ: Van 
Nostrand (1956). Chapter 5 covers curvilinear coordinates and 13 specific coordinate systems. 

Morse, P. M., and H. Feshbach, Methods of Theoretical Physics. New York: McGraw-Hill (1953). Chapter 5 
includes a description of several different coordinate systems. Note that Morse and Feshbach are not above 
using left-handed coordinate systems even for Cartesian coordinates. Elsewhere in this excellent (and difficult) 


book are many examples of the use of the various coordinate systems in solving physical problems. Chapter 6 
discusses characteristics in detail. 


CHAPTER 10 


GREEN’S FUNCTIONS 


In contrast to the linear differential operators that have been our main concern when 
formulating problems as differential equations, we now turn to methods involving inte- 
gral operators, and in particular to those known as Green’s functions. Green’s-function 
methods enable the solution of a differential equation containing an inhomogeneous term 
(often called a source term) to be related to an integral operator containing the source. As 
a preliminary and elementary example, consider the problem of determining the potential 
w(r) generated by a charge distribution whose charge density is o(r). From the Poisson 
equation, we know that w(r) satisfies 


1 
— V(r) = > p(n). (10.1) 


We also know, applying Coulomb’s law to the potential at r; produced by each element of 
charge p(r2)d°*r2, and assuming the space is empty except for the charge distribution, that 


pepe / A 2 (10.2) 
Ar €9 lr) — ro| 


Here the integral is over the entire region where o(r2) # 0. We can view the right-hand 
side of Eq. (10.2) as an integral operator that converts op into wy, and identify the kernel 
(the function of two variables, one of which is to be integrated) as the Green’s function for 
this problem. Thus, we write 

1 


G(r, r2) = —— ——_, 
4re |r) —9¥| 


(10.3) 


wir = [Bn Ger.mpce), (10.4) 
assigning our Green’s function the symbol G (for “Green’”). 
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This example is preliminary because the response of more general problems to an 
inhomogeneous term will depend on the boundary conditions. For example, an electro- 
statics problem may include conductors whose surfaces will contain charge layers with 
magnitudes that depend on p and which will also contribute to the potential at general r. 
It is elementary because the form of the Green’s function will also depend on the differen- 
tial equation to be solved, and often it will not be possible to obtain a Green’s function in 
a simple, closed form. 

The essential feature of any Green’s function is that it provides a way to describe the 
response of the differential-equation solution to an arbitrary source term (in the presence 
of the boundary conditions). In our present example, G(r}, r2) gives us the contribution 
to w at the point rj produced by a point source of unit magnitude (a delta function) at the 
point ra. The fact that we can determine yy everywhere by an integration is a consequence 
of the fact that our differential equation is linear, so each element of the source contributes 
additively. In the more general context of a PDE that depends on both spatial and time 
coordinates, Green’s functions also appear as responses of the PDE solution to impulses at 
given positions and times. 

The aim of this chapter is to identify some general properties of Green’s functions, to 
survey methods for finding them, and to begin building connections between differential- 
operator and integral-operator methods for the description of physics problems. We start 
by considering problems in one dimension. 


ONE-DIMENSIONAL PROBLEMS 
Let’s consider the second-order self-adjoint inhomogeneous ODE 


d d 
Ly = (po at ) +q(x)y = f(x), (10.5) 
x dx 


which is to be satisfied on the range a < x < b subject to homogeneous boundary condi- 
tions at x = a and x = b that will cause £ to be Hermitian.! Our Green’s function for this 
problem needs to satisfy the boundary conditions and the ODE 


LG(x,t)=5(x —1), (10.6) 
so that y(x), the solution to Eq. (10.5) with its boundary conditions, can be obtained as 
b 
yoy = f Gon feat. (10.7) 


a 
To verify Eq. (10.7), simply apply £: 
b 


b 
eya= f £60. fedr= f sen F@ar= foo. 


a 


1a homogeneous boundary condition is one that continues to be satisfied if the function satisfying it is multiplied by a scale 
factor. Most of the more commonly encountered types of boundary conditions are homogeneous, e.g., y = 0, y’ = 0, even 
ciy +c2y’ = 0. However, y = c with c a nonzero constant is not homogeneous. 
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General Properties 


To gain an understanding of the properties G(x, t) must have, we first consider the result 
of integrating Eq. (10.6) over a small range of x that includes x = t. We have 








t+é t+é t+é 
d dG(x,t) 
p(x) dx+ | q(x)G(x,t)dx= | d(t-—x)dx, 
dx dx 
t-e t—e t—é 
which, carrying out some of the integrations, simplifies to 
t+é 
dG(x,t) |‘ 
se ll q(x) G(x, t)dx =1. (10.8) 
dx t—e 
t-€ 


It is clear that Eq. (10.8) cannot be satisfied in the limit of small e if G(x,t) and 
dG(x,t)/dx are both continuous (in x) at x =f, but we can satisfy that equation if we 
require G(x,t) to be continuous but accept a discontinuity in dG(x,t)/dx at x =f. In 
particular, continuity in G will cause the integral containing g(x) to vanish in the limit 
€ — 0, and we are left with the requirement 


dG(x, t) 
dx 


dG(x,t) 
dx 


lim 
e>0+ 











1 
x=t-€é 


Thus, the discontinuous impulse at x = ¢ leads to a discontinuity in the x derivative of 
G(x,t) at that x value. Note, however, that because of the integration in Eq. (10.7), the 
singularity in dG/dx does not lead to a similar singularity in the overall solution y(x) in 
the usual case that f(x) is continuous. 

As anext step toward reaching understanding of the properties of Green’s functions, let’s 
expand G(x, tf) in the eigenfunctions of our operator £, obtained subject to the boundary 
conditions already identified. Since £ is Hermitian, its eigenfunctions can be chosen to be 
orthonormal on (a, b), with 


LOn(X) =AnGn(®)s — (PnlOm) = Sam- (10.10) 


Expanding both the x and the t dependence of G(x, t) in this orthonormal set (using the 
complex conjugates of the gy, for the t expansion), 


x=t+eé 


G(x, 1) = Do 8nmGn(x)g%, (0). (10.11) 


nm 


We also expand 6(x — t) in the same orthonormal set, according to Eq. (5.27): 


5(x —1) =) Om (x) 9%, (0). (10.12) 


Inserting both these expansions into Eq. (10.6), we have before any simplification 


LY 8nm9n() Pm (t) = D> Om (XG (0). (10.13) 


nm m 
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Applying £, which operates only on g(x), Eq. (10.13) reduces to 
~~ An&nm@n (x)p, (t)= > Pm (x)p>, (t). 
nm m 


Taking scalar products in the x and t domains, we find that gym = bnm/An, So G(x, tf) must 
have the expansion 


Ga.j=>) = (10.14) 


n 


The above analysis fails in the case that any A, is zero, but we shall not pursue that special 
case further. 

The importance of Eq. (10.14) does not lie in its dubious value as a computational tool, 
but in the fact that it reveals the symmetry of G: 


G(x, t) = G(t, x)*. (10.15) 


Form of Green’s Function 


The properties we have identified for G are sufficient to enable its more complete identifi- 
cation, given a Hermitian operator £ and its boundary conditions. We continue with the 
study of problems on an interval (a, b) with one homogeneous boundary condition at each 
endpoint of the interval. 

Given a value of f, it is necessary for x in the range a < x <t that G(x, t) have an x 
dependence yj (x) that is a solution to the homogeneous equation £ = 0 and that also satis- 
fies the boundary condition at x = a. The most general G(x, t) satisfying these conditions 
must have the form 


Gx, tHh=yi@)hy(), (<n), (10.16) 


where h;(t) is presently unknown. Conversely, in the range t < x < b, it is necessary that 
G(x, t) have the form 


G(x, t)h=yr(x)ho(t), («>F), (10.17) 


where y2 is a solution of £ = 0 that satisfies the boundary condition at x = b. The sym- 
metry condition, Eq. (10.15), permits Eqs. (10.16) and (10.17) to be consistent only if 
h; = Ay, and h} = A y2, with A a constant that is still to be determined. Assuming that 
y, and y2 can be chosen to be real, we are led to the conclusion that 


| Ayi(x)ya(t), x <t, 

G(x, tHh= (10.18) 
Aya(x)yi(t), x >t, 

where Ly; = 0, with y, satisfying the boundary condition at x = a and yp satisfying that at 
x =D. The value of A in Eq. (10.18) depends, of course, on the scale at which the y; have 
been specified, and must be set to a value that is consistent with Eq. (10.9). As applied 
here, that condition reduces to 


1 
A[ sO — OO] =: 
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equivalent to 


A=(p)[biOn® — yy) - (10.19) 


Despite its appearance, A does not depend on t. The expression involving the y; is their 
Wronskian, and it has a value proportional to 1/ p(t). See Exercise 7.6.11. 

It is instructive to verify that the form for G(x, ft) given by Eq. (10.18) causes Eq. (10.7) 
to generate the desired solution to the ODE Ly = f. To this end, we obtain an explicit form 
for y(x): 


x b 
jO=4ee / yi) ft + Ayi (x) / y(t) f(t) dt. (10.20) 


From Eq. (10.20) it is easy to verify that the boundary conditions on y(x) are satisfied; if 
x =a the first of the two integrals vanishes, and the second is proportional to y;; corre- 
sponding remarks apply at x = b. 


It remains to show that Eq. (10.20) yields Ly = f. Differentiating with respect to x, we 
first have 


y= Aye) f nN Fae + A@ICILC) 
b 
+Ayi@ [ pO FOdr— Aww fe 


x b 
= Ayy(x) / yilt) f (t)dt + Ay; (x) / y2(t) f (t) dt. (10.21) 


Proceeding to (py’)’: 


x 


[poor] = aL pcos] fn@soars 4[peorco] noose 


a 
b 


+A[peryico] f 2@s@ar— al peor}e] y269F00). (10.22) 


x 


Combining Eq. (10.22) and q(x) times Eq. (10.20), many terms drop because Ly; = 
Ly2 = 0, leaving 


Ly(x)= A p@)| C1) — y}@)y2@)] f@) = FO, (10.23) 


where the final simplification took place using Eq. (10.19). 
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Example 70.1.7 Simple SECOND-ORDER ODE 


Consider the ODE 
=y" =F), 


with boundary conditions y(0) = y(1) = 0. The corresponding homogeneous equation 
—y” =0 has general solution yo = co + c1x; from these we construct the solution yj = x 
that satisfies y;(0) = 0 and the solution y2 = 1 — x, satisfying y2(1) = 0. For this ODE, 
the coefficient p(x) = —1, y;(x) = 1, y5(x) = —1, and the constant A in the Green’s 
function is 





-1 
A=|[CDI-D@)-@a-i] =1. 
Our Green’s function is therefore 


x(1l-t), O<x<t, 
G(x,t)= 

ti-x), t<x<l. 
Assuming we can perform the integral, we can now solve this ODE with boundary condi- 
tions for any function f(x). For example, if f(x) = sinzx, our solution would be 


1 x 1 


yoy = f Gun sinatdt=(1—x) f rsinardr +x fan sin zt dt 
0 


0 x 


1, 
=— sinrx. 
72 
The correctness of this result is easily checked. 
One advantage of the Green’s function formalism is that we do not need to repeat most 
of our work if we change the function f(x). If we now take f(x) =coszx, we get 


1 
y(x) = =v} (2x - 1 + cos). 
TU 


Note that our solution takes full account of the boundary conditions. a 


Other Boundary Conditions 


Occasionally one encounters problems other than the Hermitian second-order ODEs we 
have been considering. Some, but not always all of the Green’s-function properties we 
have identified, carry over to such problems. 

Consider first the possibility that we may have nonhomogeneous boundary conditions, 
such as the problem Ly = f with y(a) =c, and y(b) = c2, with one or both c; nonzero. 
This problem can be converted into one with homogeneous boundary conditions by making 
a change of the dependent variable from y to 


ci(b—x) + 02(x —a) 
b-a , 
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In terms of u, the boundary conditions are homogeneous: u(a) = u(b) = 0. A nonhomo- 
geneous condition on the derivative, e.g., y’(a) =, can be treated analogously. 

Another possibility for a second-order ODE is that we may have two boundary condi- 
tions at one endpoint and none at the other; this situation corresponds to an initial-value 
problem, and has lost the close connection to Sturm-Liouville eigenvalue problems. The 
result is that Green’s functions can still be constructed by invoking the condition of conti- 
nuity in G(x, ft) at x =f and the prescribed discontinuity in 0G/dx, but they will no longer 
be symmetric. 


Example 10.1.2 —INraL VALUE PROBLEM 


Consider 
ad y 


Ly= Ay TY = FO): (10.24) 


with the initial conditions y(0) = 0 and y’(0) = 0. This operator £ has p(x) = 1. 

We start by noting that the homogeneous equation Ly = 0 has the two linearly indepen- 
dent solutions yj = sinx and yz = cosx. However, the only linear combination of these 
solutions that satisfies the boundary condition at x = 0 is the trivial solution y = 0, so our 
Green’s function for x < ¢ can only be G(x, t) = 0. On the other hand, for the region x > ft 
there are no boundary conditions to serve as constraints, and in that region we are free to 
write 


Gx, t)=Cif)y+Co)y2, or G(x,t)=Ci(t)sinx + Co(t)cosx, x >t. 
We now impose the requirements 


G(t_,t)=G(t,,t) — 0=C)(t)sint+ Co(f) cost, 


dG dG 1 : 
— (t+, t) —- —(t_, t) = —~ =1 — C(t) cost — C2(t) sint — (0) = 1. 
ax Ox p(t) 
These equations can now be solved, yielding C(t) = cost, C2(t) = —sint, so for x >t 


G(x, t) =costsinx — sintcosx = sin(x — f). 
Thus, the complete specification of G(x, t) is 


0, x <t, 
G(x, th= (10.25) 
sin(x—t), x>f. 


The lack of correspondence to a Sturm-Liouville problem is reflected in the lack of sym- 
metry of the Green’s function. Nevertheless, the Green’s function can be used to construct 
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the solution to Eq. (10.24) subject to its initial conditions: 


yoy= f Gann soar 
0 
= i sin(x — t) f (t)dt. (10.26) 
0 


Note that if we regard x as a time variable, our solution at “time” x is only influenced by 
source contributions from times ¢ prior to x, so Eq. (10.24) obeys causality. 

We conclude this example by observing that we can verify that y(x) as given by 
Eq. (10.26) is the correct solution to our problem. Details are left as Exercise 10.1.3. Il 


Example 10.1.3 BoUNDARYAT INFINITY 


Consider 


a 
(+e) W(x) = g(x), (10.27) 


an equation essentially similar to one we have already studied several times, but now with 

boundary conditions that correspond (when multiplied by e~'“’) to an outgoing wave. 
The general solution to Eq. (10.27) with g = 0 is spanned by the two functions 

= oe tkx +i kx 


YI and y=e 


The outgoing wave boundary condition means that for large positive x we must have the 
solution y2, while for large negative x the solution must be y;. This information suffices 
to indicate that the Green’s function for this problem must have the form 


Ayi(x’)y2(x), x > x’, 
G(x,x)= 
Ay2(x’)yi(x), x <x". 
We find the coefficient A from Eq. (10.19), in which p(x) = 1: 
ee 1 a ee 
~ yyOr)yi (x) — yy Co)y2(Qx)  ik+ ik 2k 


Combining these results, we reach 





Gx joo exp (ilx —x'l). (10.28) 
2k 
This result is yet another illustration that the Green’s function depends on boundary con- 
ditions as well as on the differential equation. 
Verification that this Green’s function yields the desired problem solution is the topic of 
Exercise 10.1.8. | 
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Relation to Integral Equations 


Consider now an eigenvalue equation of the form 
Ly(x) =AyQ@), (10.29) 


where we assume CL to be self-adjoint and subject to the boundary conditions y(a) = 
y(b) = 0. We can proceed formally by treating Eq. (10.29) as an inhomogeneous equa- 
tion whose right-hand side is the particular function Ay(x). To do so, we would first find 
the Green’s function G(x, t) for the operator £ and the given boundary conditions, after 
which, as in Eq. (10.7), we could write 


b 
yoy =a f Geen yd, (10.30) 


a 


Equation (10.30) is not a solution to our eigenvalue problem, since the unknown function 
y(x) appears on both sides and, moreover, it does not tell us the possible values of the 
eigenvalue 4. What we have accomplished, however, is to convert our eigenvalue ODE 
and its boundary conditions into an integral equation which we can regard as an alternate 
starting point for solution of our eigenvalue problem. 

Our generation of Eq. (10.30) shows that it is implied by Eq. (10.29). If we can also 
show that we can connect these equations in the reverse order, namely that Eq. (10.30) 
implies Eq. (10.29), we can then conclude that they are equivalent formulations of the 
same eigenvalue problem. We proceed by applying £ to Eq. (10.30), labeling it £, to 
make clear that it is an operator on x, not ft: 


b 
Lyy(x) =ALy / G(x, t)y(t)dt 


b 


b 
=1 f LxGo.nrndr=a f 6-ny nar 


a 


=Ay(x). (10.31) 


The above analysis shows that under rather general circumstances we will be able to 
convert an eigenvalue equation based on an ODE into an entirely equivalent eigenvalue 
equation based on an integral equation. Note that to specify completely the ODE eigen- 
value equation we had to make an explicit identification of the accompanying boundary 
conditions, while the corresponding integral equation appears to be entirely self-contained. 
Of course, what has happened is that the effect of the boundary conditions has influenced 
the specification of the Green’s function that is the kernel of the integral equation. 

Conversion to an integral equation may be useful for two reasons, the more practical 
of which is that the integral equation may suggest different computational procedures for 
solution of our eigenvalue problem. There is also a fundamental mathematical reason why 
an integral-equation formulation may be preferred: It is that integral operators, such as that 
in Eq. (10.30), are bounded operators (meaning that their application to a function y of 
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finite norm produces a result whose norm is also finite). On the other hand, differential 
operators are unbounded; their application to a function of finite norm can produce a result 
of unbounded norm. Stronger theorems can be developed for operators that are bounded. 
We close by making the now obvious observation that Green’s functions provide the 
link between differential-operator and integral-operator formulations of the same problem. 


Example 10.1.4 — DIFFERENTIAL VS. INTEGRAL FORMULATION 


Here we return to an eigenvalue problem we have already treated several times in various 
contexts, namely 


—y"(x) = Ay(x), 


subject to boundary conditions y(0) = y(1) =0. In Example 10.1.1 we found the Green’s 
function for this problem to be 


x(l-t), O<x <t, 
G(x,t)= 
ti-x), t<x<l, 
and, following Eq. (10.30), our eigenvalue problem can be rewritten as 
1 
yoy =a f Gen yar, (10.32) 
0 
Methods for solution of integral equations will not be discussed until Chapter 21, but we 
can easily verify that the well-known solution set for this problem, 


y=sinnax, he =n'n’, oa re 


also solves Eq. (10.32). | 
Exercises 
10.1.1 Show that 
x, O<x <t, 
G(x,t)= 
t, t<x<l, 
is the Green’s function for the operator £ = —d?/dx* and the boundary conditions 
y(0) =0, y'(1) =0. 
10.1.2 Find the Green’s function for 





_ @y(x) y(0) =0, 
(a) Ly yO): ee _ 
2 
(b) Ly(x)= a y() —y(x), yx) finite for -—co <x < oo. 





dx? 





10.1.3 


10.1.4 


10.1.5 


10.1.6 


10.1.7 


10.1.8 


10.1.9 
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Show that the function y(x) defined by Eq. (10.26) satisfies the initial-value problem 
defined by Eq. (10.24) and its initial conditions y(0) = y’(0) =0. 


Find the Green’s function for the equation 


with boundary conditions y(0) = y(z) = 0. 
2sin(x/2)cos(t/2), O<x <tf, 
ANS. G(x,t)= 
2cos(x/2)sin(t/2), t<x <7. 
Construct the Green’s function for 
d’y dy 
2 2,2 
— — + (k*x* —l)y=0, 
x F2 t* a, tS x )y 
subject to the boundary conditions y(0) = 0, y(1) = 0. 
Given that 
da? d 
=(1—x? 2 
Poa Te * ax 


and that G(+1, t) remains finite, show that no Green’s function can be constructed by 
the techniques of this section. 








Note. The solutions to £ = 0 needed for the regions x < t and x > ¢ are linearly depen- 
dent. 


Find the Green’s function for 


dy dy _ 
qe ae 


subject to the initial conditions (0) = w’(0) = 0, and solve this ODE for t > 0 given 
f(t) = exp(—t). 
Verify that the Green’s function 

Coa) = _ exp (iklx - x'l) 


yields an outgoing wave solution to the ODE 


ee 
(5 +k ) W(x) = g(x). 
Note. Compare with Example 10.1.3. 

Construct the 1-D Green’s function for the modified Helmholtz equation, 


d2 
(= - e) w(x) = f(x). 
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10.1.10 


10.1.11 


10.1.12 


10.1.13 


The boundary conditions are that the Green’s function must vanish for x — oo and 
xa -OH. 





1 
ANS. G(x1,x2)= = oxP ( k\xy x21). 


From the eigenfunction expansion of the Green’s function show that 





2 “sin nzx sinnzt x(—-t), O<x<t, 
(a) ee 2 = 
a n t(l-x), t<x<l. 


n=1 





2 
) = 


ies sin(n + 5)mx sin(n + 4)7t x, O<x<t, 
t, t<x<l. 


1)2 
n=0 (n+ 3) 


Derive an integral equation corresponding to 


y@)-yo)=0, y=1, y(-1)=1, 


(a) by integrating twice. 


(b) by forming the Green’s function. 


1 
ANS. y(x)=1- K (x,t) y@) dt, 
-1 
s(1—x)¢+1, x>8, 
K(x,t)= 
s3(1—t)(x+1), x<t. 
The general second-order linear ODE with constant coefficients is 
y"(x) tary’ (x) tanya) = 0. 


Given the boundary conditions y(0) = y(1) = 0, integrate twice and develop the inte- 
gral equation 


1 
y(x) = if K (x,t) y(t) dt, 
0 


with 
agt(l—-x)+a,\(x-1), t<x, 
anx(1—t)+ a,x, x<t. 


kon=| 


Note that K (x,t) is symmetric and continuous if a; = 0. How is this related to self- 
adjointness of the ODE? 


Transform the ODE 


d*y(r) e! 
7 — yr) + Vo 
r r 








y(r) =0 
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and the boundary conditions y(0) = y(co) = 0 into an integral equation of the form 


ee) 


—t 
yr)= =v | Ge t) — y(a)dt. 
0 


The quantities Vo and k* are constants. The ODE is derived from the Schrédinger wave 
equation with a mesonic potential: 


ee 
e sinhkr, O<r<t, 
G(r, t= 


=— e sinhkt, t<r<oo. 


10.2 PROBLEMS IN TWO AND THREE DIMENSIONS 


Basic Features 


The principles, but unfortunately not all the details of our analysis of Green’s functions in 
one dimension, extend to problems of higher dimensionality. We summarize here proper- 
ties of general validity for the case where L is a linear second-order differential operator 
in two or three dimensions. 


1. A homogeneous PDE Ly(r;) = 0 and its boundary conditions define a Green’s 
function G(r), r2), which is the solution of the PDE 


LG(r1,r2) = 6(r — r2) 


subject to the relevant boundary conditions. 


2. The inhomogeneous PDE Ly (r) = f(r) has, subject to the boundary conditions of 
Item 1, the solution 


vep= i G(rt,12) f(t2) d3ra, 


where the integral is over the entire space relevant to the problem. 
3. When C and its boundary conditions define the Hermitian eigenvalue problem 
Lw =Aw with eigenfunctions gy, (r) and corresponding eigenvalues i,,, then 


e G(r,,¥r2) is symmetric, in the sense that 
G(r), 12) = G*(r2,1r1), and 


e G(r1,¥r2) has the eigenfunction expansion 


G(r, 12) = ss eee 


n 
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4. G(r1,¥rz) will be continuous and differentiable at all points such that r; # rg. We 
cannot even require continuity in a strict sense at r; = r2 (because our Green’s func- 
tion may become infinite there), but we can have the weaker condition that G remain 
continuous in regions that surround, but do not include r; = r2. G must have more 
serious singularities in its first derivatives, so that the second-order derivatives in £ 
will generate the delta-function singularity characteristic of G and specified in Item 1. 


What does not carry over from the 1-D case are the explicit formulas we used to con- 
struct Green’s functions for a variety of problems. 


Self-Adjoint Problems 


In more than one dimension, a second-order differential equation is self-adjoint if it has 
the form 


Lyin =V-| pmVWO] +a@v~ =f, (10.33) 


with p(r) and q(r) real. This operator will define a Hermitian problem if its boundary 
conditions are such that (g|Cw) = (Lely). See Exercise 10.2.2. 
Assuming we have a Hermitian problem, consider the scalar product 


(cer, r1)| LG, r»)) = (caer, r)| G«r, r2)). (10.34) 


Here the scalar product and £ both refer to the variable r, and the Hermitian property is 
responsible for this equality. The points r; and ro are arbitrary. Noting that 2G results in 
a delta function, we have, from the left-hand side of Eq. (10.34), 


(cer, r)| LGC, r»)) = (cer, ri| S(r— r»)) = G*(r,4}). (10.35) 
But, from the right-hand side of Eq. (10.34), 





(corr, ri| G(r, r2)) = (s@ = ri| G(r, r»)) = G(r,12). (10.36) 


Substituting Eqs. (10.35) and (10.36) into Eq. (10.34), we recover the symmetry condition 
G(1, 12) = G*(r2, 11). 


Eigenfunction Expansions 


We already saw, in 1-D Hermitian problems, that the Green’s function of a Hermitian 
problem can be written as an eigenfunction expansion. If £, with its boundary conditions, 
has normalized eigenfunctions g,(r) and corresponding eigenvalues 4,,, our expansion 
took the form 
WO Fn 2) Gn (FV) 
Gin) = >> —_ (10.37) 


n 


It turns out to be useful to consider the somewhat more general equation 


Ly (ri) —Aw(ri) = $(r2 — 11), (10.38) 
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where A is a parameter (not an eigenvalue of £). In this more general case, an expansion in 
the gy, yields for the Green’s function of the entire left-hand side of Eq. (10.38) the formula 


G(r.) => a (10.39) 


Note that Eq. (10.39) will be well-defined only if the parameter A is not equal to any of the 
eigenvalues of L. 


Form of Green’s Functions 


In spaces of more than one dimension, we cannot divide the region under consideration 
into two intervals, one on each side of a point (here designated rz), then choosing for 
each interval a solution to the homogeneous equation appropriate to its outer boundary. 
A more fruitful approach will often be to obtain a Green’s function for an operator £ 
subject to some particularly convenient boundary conditions, with a subsequent plan to 
add to it whatever solution to the homogeneous equation £w(r) = 0 that may be needed 
to adapt to the boundary conditions actually under consideration. This approach is clearly 
legitimate, as the addition of any solution to the homogeneous equation will not affect the 
(dis)continuity properties of the Green’s function. 

We consider first the Laplace operator in three dimensions, with the boundary condition 
that G vanish at infinity. We therefore seek a solution to the inhomogeneous PDE 


ViG(r1, r2) = 4(r1 — 2) (10.40) 


with lim,,— 50 G(r, r2) = 0. We have added a subscript “1” to V to remind the reader that 
it operates on r; and not on rz. Since our boundary conditions are spherically symmetric 
and at an infinite distance from r; and rz, we may make the simplifying assumption that 
G(r, rz) is a function only of r}2 = |r, — ro]. 

Our first step in processing Eq. (10.40) is to integrate it over a spherical volume of radius 
a centered at ro: 


if Vi -ViG(r,12)d°r; = 1, (10.41) 


rj2<a 


where we have reduced the right-hand side using the properties of the delta function and 
written the left-hand side in a form making it ready for the application of Gauss’ theorem. 
We now apply that theorem to the left-hand side of Eq. (10.41), reaching 


» dG 
ViG(r,¥%)-do; =4na2 —— =; (10.42) 
rj2=a aes 


Since Eq. (10.42) must be satisfied for all values of a, it is necessary that 


d 
a. G(r, r2) = a a 
drj2 12 
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which can be integrated to yield 


G(r, %2) = -——— = (10.43) 
4x |r1 —r2| 
We do not need to add a constant of integration because this form for G vanishes at infinity. 

At this point it may be useful to note that the sign of G(r, r2) depends on the sign asso- 
ciated with the differential operator of which it is a Green’s function. Some texts (including 
previous editions of this book) have defined G as produced by a negative delta function 
so that Eq. (10.43) when associated with +V7 would not need a minus sign. There is, 
of course, no ambiguity in any physical results because a change in the sign of G must 
be accompanied by a change in the sign of the integral in which G is combined with the 
inhomogeneous term of a differential equation. 

The Green’s function of Eq. (10.43) is only going to be appropriate for an infinite system 
with G = 0 at infinity but, as mentioned already, it can be converted into the Green’s func- 
tions of another problem by addition of a suitable solution to the homogeneous equation 
(in this case, Laplace’s equation). Since that is a reasonable starting point for a variety 
of problems, the form given in Eq. (10.43) is sometimes called the fundamental Green’s 
function of Laplace’s equation (in three dimensions). 

Let’s now repeat our analysis for the Laplace operator in two dimensions for a region 
of infinite extent, using circular coordinates p = (p, y). The integral in Eq. (10.41) is then 
over a circular area, and the 2-D analog of Eq. (10.42) becomes 





dG 
if ViG(p}, 02) -do,; =2x7a —— =1, 
apie | poma 
pi2=a 
leading to 
d 

G(p ,/p = ? 
dp\2 sa 27 P12 


which has the indefinite integral 


1 
GKAi Pa) = 5 Pi Pal: (10.44) 
ia 


The form given in Eq. (10.44) becomes infinite at infinity, but it nevertheless can be 
regarded as a fundamental 2-D Green’s function. However, note that we will generally 
need to add to it a suitable solution to the 2-D Laplace equation to obtain the form needed 
for specific problems. 

The above analysis indicates that the Green’s function for the Laplace equation in 2-D 
space is rather different than the 3-D result. This observation illustrates the fact that there 
is areal difference between flatland (2-D) physics and actual (3-D) physics, even when the 
latter is applied to problems with translational symmetry in one direction. 

This is also a good time to note that the symmetry in the Green’s function corresponds 
to the notion that a source at r2 produces a result (a potential) at r; that is the same as the 
potential at rz from a similar source at r;. This property will persist in more complicated 
problems so long as their definition makes them Hermitian. 
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Table 10.1 Fundamental Green’s Functions“ 




















Laplace Helmholtz? Modified 
v2 V2+4kK Helmholtz° 
v2_K2 
1-D : | | (ik| \) : (-k| I) 
7 = |X, -xX — =z CXpUK |x x ex x x 
5 a — #2 ag CP la — 2 ag oP 17 %x2 
2D In| | HOU 1) ~ Koik| 1) 
: jnlgee a 
on Pi — p2 4°70 Pi — P2 on OKIP] — P2 
3D 1 1 exp(ik|r] — r|) exp(—k|rj — rg|) 
4x |r) —r9| 4x|r, — r2| 4x |r} —r2| 





“ Boundary conditions: For the Helmholtz equation, outgoing wave; 
for modified Helmholtz and 3-D Laplace equations, G — 0 at infinity; 
for 1-D and 2-D Laplace equation, arbitrary. 

byt is a Hankel function, Section 14.4. 

“Ko is a modified Bessel function, Section 14.5. 


Because they occur rather frequently, it is useful to have Green’s functions for the 
Helmholtz and modified Helmholtz equations in two and three dimensions (for one dimen- 
sion these Green’s functions were introduced in Example 10.1.3 and Exercise 10.1.9). For 
the Helmholtz equation, a convenient fundamental form results if we take boundary con- 
ditions corresponding to an outgoing wave, meaning that the asymptotic r dependence 
must be of the form exp(+ikr). For the modified Helmholtz equation, the most convenient 
boundary condition (for one, two, and three dimensions) is that G decay to zero in all direc- 
tions at large r. The one-, two-, and three-dimensional (3-D) fundamental Green’s functions 
for the Laplace, Helmholtz, and modified Helmholtz operators are listed in Table 10.1. 

We shall not derive here the forms of the Green’s functions for the Helmholtz equations; 
in fact, for two dimensions, they involve Bessel functions and are best treated in detail in a 
later chapter. However, for three dimensions, the Green’s functions are of relatively simple 
form, and the verification that they return correct results is the topic of Exercises 10.2.4 
and 10.2.6. The fundamental Green’s function for the 1-D Laplace equation may not be 
instantly recognizable in comparison to the formulas we derived in Section 10.1, but con- 
sistency with our earlier analysis is the topic of Example 10.2.1 

Sometimes it is useful to represent Green’s functions as expansions that take advantage 
of the specific properties of various coordinate systems. The so-called spherical Green’s 
function is the radial part of such an expansion in spherical polar coordinates. For the 
Laplace operator, it takes a form developed in Eqs. (16.65) and (16.66). We write it here 
only to show that it exhibits the two-region character that provides a convenient represen- 
tation of the discontinuity in the derivative: 


i a S2+1 
ee 2 2i+1 " 
4m |r} —ro| dX Age g(r, 12) Pi(cos x), 
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where x is the angle between r; and ro, P; is a Legendre polynomial, and the spherical 
Green’s function g(r1, r2) is 
l 


1 ry 
+1 AT? r) <2, 
2 
gi(ri,r2) = i 
1 r 
ry) >712. 


= 2 
21+1 oh ; 
An explicit derivation of the formula for g; is given in Example 16.3.2. 
In cylindrical coordinates (p, g, z) one encounters an axial Green’s function g,,(01, 02), 


in terms of which the fundamental Green’s function for the Laplace operator takes the form 
(also involving a continuous parameter k) 


1 1 
G(r), %) = ——- ———— 
(11,12) ee =e 
<< ° 
See / 8m (kp1, kp2) cosk(z1 — z2)dk. 
2a m=—oo 0 


Here 


&m (kp, ke2) = —Im(kp<) Km(kps), 


where p< and ps are, respectively, the smaller and larger of 0; and 02. The quantities I), 
and K,, are modified Bessel functions, defined in Chapter 14. This expansion is discussed 
in more detail in Example 14.5.1. Again we note the two-region character. 


Example 10.2.1. = AccomMMODATING BOUNDARY CONDITIONS 


Let’s use the fundamental Green’s function of the 1-D Laplace equation, 


d(x) 
dx2 
to illustrate how we can modify it to accommodate specific boundary conditions. We return 
to the oft-used example with Dirichlet conditions y = 0 at x = 0 and x = 1. The continu- 
ity of G and the discontinuity in its derivative are unaffected if we add to the above G one 
or more terms of the form f(x,)g(x2), where f and g are solutions of the 1-D Laplace 
equation, i.e., any functions of the form ax + b. 
For the boundary conditions we have specified, the Green’s function we require has the 
form 





1 
0, namely OO a2) "5 ea); 


1 1 
G(x1, x2) = 5 + x2) +x1x2 + 5 |x1 — x2]. 
The continuous and differentiable terms we have added to the fundamental form bring us 
to the result 
—$(x1 +.x2) +142 + 502 — xy) =H (1x2), x1 <9, 
G(x1, x2) = 1 1 
—3(%1 + x2) + e142 + 5001 — x2) = —x2(1— 41), 2X2 <1. 


This result is consistent with what we found in Example 10.1.1. a 
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Example 10.2.2. Quantum MECHANICAL SCATTERING: BORN APPROXIMATION 


The quantum theory of scattering provides a nice illustration of Green’s function tech- 
niques and the use of the Green’s function to obtain an integral equation. Our physical 
picture of scattering is as follows. A beam of particles moves along the negative z-axis 
toward the origin. A small fraction of the particles is scattered by the potential V(r) and 
goes off as an outgoing spherical wave. Our wave function y(r) must satisfy the time- 
independent Schrédinger equation 





hz 
— x Vv) + VV = Ev), (10.45) 
or 
) 2 2m , 2mE 
Vyn+ey~ =| VOVM], P=. (10.46) 
From the physical picture just presented we look for a solution having the asymptotic 
form 
. ikr 
wr) ~ eT + f(0, 9) —, (10.47) 
r 


where e'*0T is an incident plane wave’ with the propagation vector kg carrying the sub- 
script 0 to indicate that it is in the @ = O (z-axis) direction. The e’*” /r term describes an 
outgoing spherical wave with an angular and energy-dependent amplitude factor fx (0, ~),° 
and its 1/r radial dependence causes its asymptotic total flux to be independent of r. This 
is a consequence of the fact that the scattering potential V(r) becomes negligible at large r. 

Equation (10.45) contains nothing describing the internal structure or possible motion of 
the scattering center and therefore can only represent elastic scattering, so the propagation 
vector of the incoming wave, ko, must have the same magnitude, k, as the scattered wave. 
In quantum mechanics texts it is shown that the differential probability of scattering, called 
the scattering cross section, is given by | fx (9, |. 

We now need to solve Eq. (10.46) to obtain w(r) and the scattering cross section. Our 
approach starts by writing the solution in terms of the Green’s function for the operator 
on the left-hand side of Eq. (10.46), obtaining an integral equation because the inhomoge- 
neous term of that equation has the form (2m /h?) V(r) W(r): 


2m 3 
oye / SV (t3) WOta) Gr ta) dra (10.48) 


We intend to take the Green’s function to be the fundamental form given for the Helmholtz 
equation in Table 10.1. We then recover the exp(ikr)/r part of the desired asymptotic 
form, but the incident-wave term will be absent. We therefore modify our tentative for- 
mula, Eq. (10.48), by adding to its right-hand side the term exp(iko - r), which is legiti- 
mate because this quantity is a solution to the homogeneous (Helmholtz) equation. That 





2For simplicity we assume a continuous incident beam. In a more sophisticated and more realistic treatment, Eq. (10.47) would 
be one component of a wave packet. 
31f V(r) represents a central force, f; will be a function of 6 only, independent of the azimuthal angle ¢. 
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approach leads us to 
; mM ik\r,—r2| 
wr) = elkort — / an V (12) (12) ——— aro. (10.49) 
hi 4z |r) — r2| 
This integral equation analog of the original Schrédinger wave equation is exact. It is 
called the Lippmann-Schwinger equation, and is an important starting point for studies 
of quantum-mechanical scattering phenomena. 
We will later study methods for solving integral equations such as that in Eq. (10.49). 
However, in the special case that the unscattered amplitude 
wor) = kon (10.50) 
dominates the solution, it is a satisfactory approximation to replace w(r2) by Wo(r2) within 
the integral, obtaining 
; 2, ik|r;—ro2| ' 
vir) = e/ko™! / a V (0) gn Pr. (10.51) 
he 4 |r) — r2| 
This is the famous Born approximation. It is expected to be most accurate for weak 
potentials and high incident energy. a 
Exercises 


10.2.1 Show that the fundamental Green’s function for the 1-D Laplace equation, |x; — x2|/2, 


is consistent with the form found in Example 10.1.1. 


10.2.2. Show that if 


Lyn =V-[ pOVYO)| +a~¥O, 


then £ is Hermitian for p(1r) and g(r) real, assuming Dirichlet boundary conditions on 
the boundary of a region and that the scalar product is an integral over that region with 
unit weight. 


10.2.3. Show that the terms +k? in the Helmholtz operator and —k? in the modified Helmholtz 


operator do not affect the behavior of G(r, rz) in the immediate vicinity of the singular 
point rj = rp. Specifically, show that 


lim G(r}, ¥2)d°r2 = —1. 


|rj—1r2|+0 


10.2.4 Show that 


exp(ik|r — ro|) 





4r |r; —Yr2| 


satisfies the appropriate criteria and therefore is a Green’s function for the Helmholtz 
equation. 
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10.2.5 Find the Green’s function for the 3-D Helmholtz equation, Exercise 10.2.4, when the 
wave is a standing wave. 


10.2.6 Verify that the formula given for the 3-D Green’s function of the modified Helmholtz 
equation in Table 10.1 is correct when the boundary conditions of the problem are that 
G vanish at infinity. 


10.2.7. ~Anelectrostatic potential (mks units) is 


ew 








en) = 4neg 


Reconstruct the electrical charge distribution that will produce this potential. Note that 
g(r) vanishes exponentially for large r, showing that the net charge is zero. 
Z a ew 


ANS. pO) = eae - 
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5(r — r’). Considerable attention is devoted to bounded regions. 
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CHAPTER 11 


COMPLEX VARIABLE 
‘THEORY 


The imaginary numbers are a wonderful 
flight of God’s spirit; they are almost an 
amphibian between being and not being. 


GOTTFRIED WILHELM VON LEIBNIZ, 1702 


We turn now to a study of complex variable theory. In this area we develop some of the 
most powerful and widely useful tools in all of analysis. To indicate, at least partly, why 
complex variables are important, we mention briefly several areas of application. 


1. 


In two dimensions, the electric potential, viewed as a solution of Laplace’s equation, 
can be written as the real (or the imaginary) part of a complex-valued function, and this 
identification enables the use of various features of complex variable theory (specifi- 
cally, conformal mapping) to obtain formal solutions to a wide variety of electrostatics 
problems. 

The time-dependent Schrédinger equation of quantum mechanics contains the imagi- 
nary unit i, and its solutions are complex. 

In Chapter 9 we saw that the second-order differential equations of interest in physics 
may be solved by power series. The same power series may be used in the complex 
plane to replace x by the complex variable z. The dependence of the solution f(z) at 
a given zg on the behavior of f(z) elsewhere gives us greater insight into the behavior 
of our solution and a powerful tool (analytic continuation) for extending the region in 
which the solution is valid. 

The change of a parameter k from real to imaginary, k — ik, transforms the Helmholtz 
equation into the time-independent diffusion equation. The same change connects the 
spherical and hyperbolic trigonometric functions, transforms Bessel functions into 
their modified counterparts, and provides similar connections between other super- 
ficially dissimilar functions. 
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Chapter 11 Complex Variable Theory 
5. Integrals in the complex plane have a wide variety of useful applications: 


e Evaluating definite integrals and infinite series, 
e Inverting power series, 
e Forming infinite products, 


e Obtaining solutions of differential equations for large values of the variable 
(asymptotic solutions), 


e Investigating the stability of potentially oscillatory systems, 


e Inverting integral transforms. 


6. Many physical quantities that were originally real become complex as a simple physi- 
cal theory is made more general. The real index of refraction of light becomes a com- 
plex quantity when absorption is included. The real energy associated with an energy 
level becomes complex when the finite lifetime of the level is considered. 


COMPLEX VARIABLES AND FUNCTIONS 


We have already seen (in Chapter 1) the definition of complex numbers z = x + iy as 
ordered pairs of two real numbers, x and y. We reviewed there the rules for their arithmetic 
operations, identified the complex conjugate z* of the complex number z, and discussed 
both the Cartesian and polar representations of complex numbers, introducing for that pur- 
pose the Argand diagram (complex plane). In the polar representation z = re!’, we noted 
that r (the magnitude of the complex number) is also called its modulus, and the angle 
6 is known as its argument. We proved that e’® satisfies the important equation 


6 — cosd +i sind. (11.1) 


e! 
This equation shows that for real 6, e!° is of unit magnitude and is therefore situated on 
the unit circle, at an angle 6 from the real axis. 

Our focus in the present chapter is on functions of a complex variable and on their 
analytical properties. We have already noted that by defining complex functions f(z) to 
have the same power-series expansion (in z) as the expansion (in x) of the correspond- 
ing real function f(x), the real and complex definitions coincide when z is real. We also 
showed that by use of the polar representation, z = re!°, it becomes clear how to com- 
pute powers and roots of complex quantities. In particular, we noted that roots, viewed as 
fractional powers, become multivalued functions in the complex domain, due to the fact 
that exp(2nzi) = 1 for all positive and negative integers n. We thus found z!/* to have 
two values (not a surprise, since for positive real x, we have +,/x). But we also noted 
that z!/”" will have m different complex values. We also noted that the logarithm becomes 
multivalued when extended to complex values, with 


Inz=In(re’’) =Inr +i(6 + 2nz), (11.2) 





with n any positive or negative integer (including zero). 
If necessary, the reader should review the topics mentioned above by rereading 
Section 1.8. 
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CAUCHY-RIEMANN CONDITIONS 


Having established complex functions of a complex variable, we now proceed to differen- 
tiate them. The derivative of f(z), like that of a real function, is defined by 


fE+s)-f/@ _ 4 F@ _ af _ 
80 (z+dz)—z = bz zis dC 


provided that the limit is independent of the particular approach to the point z. For real 
variables we require that the right-hand limit (x — x9 from above) and the left-hand limit 
(x — xo from below) be equal for the derivative df (x)/dx to exist at x = x9. Now, with z 
(or Zo) some point in a plane, our requirement that the limit be independent of the direction 
of approach is very restrictive. 

Consider increments 5x and dy of the variables x and y, respectively. Then 


f'), (11.3) 


6z = 6x + idy. (11.4) 
Also, writing f =u +iv, 

bf = du +idv, (11.5) 
so that 

5 du +idv 

Ze ae a, 


Let us take the limit indicated by Eq. (11.3) by two different approaches, as shown in 
Fig. 11.1. First, with dy = 0, we let 6x — 0. Equation (11.3) yields 


5 bu dv\ au 8 
he ee (11.7) 
ox Ox 





sc>06z. dx>0\ bx bx 


assuming that the partial derivatives exist. For a second approach, we set 5x = 0 and then 
let Sy — 0. This leads to 


. of . bu dv Ou Ov 
lim —= lim i—-+ = -I1— + —. (11.8) 
6z>06Z © 5y>0 dy dy dy oy 


If we are to have a derivative df/dz, Eqs. (11.7) and (11.8) must be identical. Equating 
real parts to real parts and imaginary parts to imaginary parts (like components of vectors), 
we obtain 





du ay 8 P 
ee) ee (11.9) 
dy Ox 





ox dy 











FiGURE 11.1 Alternate approaches to zo. 
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These are the famous Cauchy-Riemann conditions. They were discovered by Cauchy 
and used extensively by Riemann in his development of complex variable theory. These 
Cauchy-Riemann conditions are necessary for the existence of a derivative of f(z). That 
is, in order for df/dz to exist, the Cauchy-Riemann conditions must hold. 

Conversely, if the Cauchy-Riemann conditions are satisfied and the partial derivatives 
of u(x, y) and v(x, y) are continuous, the derivative df/dz exists. To show this, we start 
by writing 


dv ou dv 
5 —)6 — +i— ]} dy, 11.10 
= & +i) v4 (Fie) . 


where the justification for this expression depends on the continuity of the partial deriva- 
tives of u and v. Using the Cauchy-Riemann equations, Eq. (11.9), we convert Eq. (11.10) 


to the form 
ou dv du ou 
éf=|—+i— ]6 —-—+i—]56 
f (F +i) s+( tic) : 


-(> +122) (8x + idy). (11.11) 


Replacing 5x + idy by 6z and bringing it to the left-hand side of Eq. (11.11), we reach 


OF eg (2 
bz Ox rox aM) 


an equation whose right-hand side is independent of the direction of 5z (i.e., the relative 
values of 5x and dy). This independence of directionality meets the condition for the exis- 
tence of the derivative, df/dz. 


Analytic Functions 


If f(z) is differentiable and single-valued in a region of the complex plane, it is said to 
be an analytic function in that region.! Multivalued functions can also be analytic under 
certain restrictions that make them single-valued in specific regions; this case, which is 
of great importance, is taken up in detail in Section 11.6. If f(z) is analytic everywhere 
in the (finite) complex plane, we call it an entire function. Our theory of complex vari- 
ables here is one of analytic functions of a complex variable, which points up the crucial 
importance of the Cauchy-Riemann conditions. The concept of analyticity carried on in 
advanced theories of modern physics plays a crucial role in the dispersion theory (of ele- 
mentary particles). If f’(z) does not exist at z = zo, then zo is labeled a singular point; 
singular points and their implications will be discussed shortly. 
To illustrate the Cauchy-Riemann conditions, consider two very simple examples. 


'Some writers use the term holomorphic or regular. 
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Example 17.2.1. = 22 Is ANALYTIC 


Let f(z) =z. Multiplying out (x —iy)(x —iy) = x? — y* + 2ixy, we identify the real part 
of z? as u(x, y) =x? — y? and its imaginary part as v(x, y) = 2xy. Following Eq. (11.9), 
ou dv ou Ov 


— —2x = ; = 2y= a 
ox oy dy Ox 





We see that f(z) = z* satisfies the Cauchy-Riemann conditions throughout the complex 
plane. Since the partial derivatives are clearly continuous, we conclude that f(z) = 2? is 
analytic, and is an entire function. a 


Example 17.2.2 = z* Is NoT ANALYTIC 


Let f(z) = z*, the complex conjugate of z. Now u = x and v = —y. Applying the Cauchy- 
Riemann conditions, we obtain 

du 

ax 
The Cauchy-Riemann conditions are not satisfied for any values of x or y and f(z) = z* 
is nowhere an analytic function of z. It is interesting to note that f(z) = z* is continu- 
ous, thus providing an example of a function that is everywhere continuous but nowhere 
differentiable in the complex plane. a 


dv 
14— =-1. 
A ay 


The derivative of a real function of a real variable is essentially a local characteristic, in 
that it provides information about the function only in a local neighborhood, for instance, 
as a truncated Taylor expansion. The existence of a derivative of a function of a com- 
plex variable has much more far-reaching implications, one of which is that the real and 
imaginary parts of our analytic function must separately satisfy Laplace’s equation in two 
dimensions, namely 


ep ay i 
ax2  ay2 
To verify the above statement, we differentiate the first Cauchy-Riemann equation in 
Eq. (11.9) with respect to x and the second with respect to y, obtaining 
du a?v au av 
dx2 axdy’ = dy2 ss Ayax 
Combining these two equations, we easily reach 
a7u 4 a7u _ 
ax2 ay? 
confirming that u(x, y), the real part of a differentiable complex function, satisfies the 
Laplace equation. Either by recognizing that if f(z) is differentiable, so is —if(z) = 
v(x, y) — iu(x, y), or by steps similar to those leading to Eq. (11.13), we can confirm 
that u(x, y) also satisfies the two-dimensional (2-D) Laplace equation. Sometimes u and 
v are referred to as harmonic functions (not to be confused with spherical harmonics, 
which we will later encounter as the angular solutions to central force problems). 





0, (11.13) 
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The solutions u(x, y) and v(x, y) are complementary in that the curves of constant 
u(x, y) make orthogonal intersections with the curves of constant u(x, y). To confirm this, 
note that if (x9, yo) is on the curve u(x, y) =c, then x9 + dx, yo+dy 1s also on that curve if 

Ou Ou 


—d —dy=0, 
Ox aoe ye id 


meaning that the slope of the curve of constant u at (xo, yo) is 


d = 
oY) 2 Oe (11.14) 
dx }, du/dy 





where the derivatives are to be evaluated at (xo, yo). Similarly, we can find that the slope 
of the curve of constant v at (xo, yo) is 


dy\ _—dv/dx _ du/dy 
dx}, dv/dy — du/ax’ 





(11.15) 


where the last member of Eq. (11.15) was reached using the Cauchy-Riemann equations. 
Comparing Eqs. (11.14) and (11.15), we note that at the same point, the slopes they 
describe are orthogonal (to check, verify that dx,dx, + dy,dyy =0). 

The properties we have just examined are important for the solution of 2-D electrostatics 
problems (governed by the Laplace equation). If we have identified (by methods outside 
the scope of the present text) an appropriate analytic function, its lines of constant u will 
describe electrostatic equipotentials, while those of constant v will be the stream lines of 
the electric field. 

Finally, the global nature of our analytic function is also illustrated by the fact that it 
has not only a first derivative, but in addition, derivatives of all higher orders, a property 
which is not shared by functions of a real variable. This property will be demonstrated in 
Section 11.4. 


Derivatives of Analytic Functions 


Working with the real and imaginary parts of an analytic function f(z) is one way to take 
its derivative; an example of that approach is to use Eq. (11.12). However, it is usually 
easier to use the fact that complex differentiation follows the same rules as those for real 
variables. As a first step in establishing this correspondence, note that, if f(z) is analytic, 
then, from Eq. (11.12), 

af 


f= 


and that 


[soso] = (<) [ F@s@] = (=) [fos] 


a 
~ (5) g(z) + Ff) (5) = f'@az) + fe’), 
x Ox 
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the familiar rule for differentiating a product. Given also that 


dz oz 1 
dz ox 
we can easily establish that 
dz’ dz” 
oe 2z, and, by induction, ae nz} 
dz dz 


Functions defined by power series will then have differentiation rules identical to those 
for the real domain. Functions not ordinarily defined by power series also have the same 
differentiation rules as for the real domain, but that will need to be demonstrated case by 
case. Here is an example that illustrates the establishment of a derivative formula. 


Example 77.2.3 > DERIVATIVE OF LOGARITHM 


We want to verify that dInz/dz = 1/z. Writing, as in Eq. (1.138), 

Inz=Inr +106 + 2nzi, 
we note that if we write Inz =u + iv, we have u = Inr, v = 6 + 2nz. To check whether 
Inz satisfies the Cauchy-Riemann equations, we evaluate 


du lor x du lor_ y 
2:3 





Ox or ax r2’ dy ray r 
dv 06 —y dv OO x 
zi 





dx ax 2’ dy ay or 
The derivatives of r and 6 with respect to x and y are obtained from the equations connect- 
ing Cartesian and polar coordinates. Except at r = 0, where the derivatives are undefined, 
the Cauchy-Riemann equations can be confirmed. 
Then, to obtain the derivative, we can simply apply Eq. (11.12), 





d\nz ae ge x —iy 1 1 
— L — — = = 
dz ax ax r2 x+iy z 
Because Inz is multivalued, it will not be analytic except under conditions restricting it to 
single-valuedness in a specific region. This topic will be taken up in Section 11.6. | 


Point at Infinity 


In complex variable theory, infinity is regarded as a single point, and behavior in its neigh- 
borhood is discussed after making a change of variable from z to w = 1/z. This transfor- 
mation has the effect that, for example, z = —R, with R large, lies in the w plane close 
to z= +R, thereby among other things influencing the values computed for derivatives. 
An elementary consequence is that entire functions, such as z or e*, have singular points 
at z = oo. As a trivial example, note that at infinity the behavior of z is identified as that of 
1/w as w — 0, leading to the conclusion that z is singular there. 
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Exercises 


11.2.1 
11.2.2 


11.2.3 


11.2.4 


11.2.5 


11.2.6 


11.2.7 


11.2.8 


11.2.9 


Show whether or not the function f(z) = #(z) = x is analytic. 


Having shown that the real part u(x, y) and the imaginary part v(x, y) of an analytic 
function w(z) each satisfy Laplace’s equation, show that neither u(x, y) nor v(x, y) can 
have either a maximum or a minimum in the interior of any region in which w(z) is 
analytic. (They can have saddle points only.) 


Find the analytic function 
w(z) =u(x, y) tiv(x, y) 


(a) ifu(x,y)=x>—3xy*, (b) if v(x, y) =e? sinx. 


If there is some common region in which w; = u(x, y) + iv(x, y) and w2 = wy = 
u(x, y) —iv(x, y) are both analytic, prove that u(x, y) and v(x, y) are constants. 


Starting from f(z) = 1/(x + iy), show that 1/z is analytic in the entire finite z plane 
except at the point z = 0. This extends our discussion of the analyticity of z” to negative 
integer powers n. 


Show that given the Cauchy-Riemann equations, the derivative f’(z) has the same value 
for dz =adx + ibdy (with neither a nor b zero) as it has for dz = dx. 


Using f(re’®) = R(r, 0)e!°), in which R(r,@) and @(r, 6) are differentiable real 
functions of r and 0, show that the Cauchy-Riemann conditions in polar coordinates 
become 


dR RAO 


10R a0 
esta Frets ley —_R 
or r 00 


(a) (b) oe ae 


Hint. Set up the derivative first with 6z radial and then with 5z tangential. 


As an extension of Exercise 11.2.7 show that ©(r, 0) satisfies the 2-D Laplace equation 
in polar coordinates, 


20 1380 1870 


=0. 
or2 r or r2 002 





For each of the following functions f(z), find f’(z) and identify the maximal region 
within which f(z) is analytic. 


sin Z 





(a) are @ f@=e™, 
1 = z* —3z+4+2, 
703s 5, Pre 
zt] 
(f) f(z) = tan(z), 
(c) f= (g) f(z) =tanh(z). 


z(z+1)’ 
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11.2.10 For what complex values do each of the following functions f(z) have a derivative? 


@) f@jer", 

OG) f@sn, 

(c) f(z) =tan!(z), 
(d) f(z) =tanh'(z). 


11.2.11_ Two-dimensional irrotational fluid flow is conveniently described by a complex poten- 


tial f(z) = u(x, v) + iv(x, y). We label the real part, u(x, y), the velocity potential, 
and the imaginary part, v(x, y), the stream function. The fluid velocity V is given by 
V=Vu. If f(z) is analytic: 

(a) Show that df/dz = V, —iVy. 

(b) Show that V - V =0 (no sources or sinks). 

(c) Show that V x V=0 (irrotational, nonturbulent flow). 


11.2.12 The function f(z) is analytic. Show that the derivative of f(z) with respect to z* does 


11.3 


not exist unless f(z) is a constant. 
Hint. Use the chain rule and take x = (z + z*)/2, y = (z — z*)/2i. 


Note. This result emphasizes that our analytic function f(z) is not just a complex func- 
tion of two real variables x and y. It is a function of the complex variable x + iy. 


CAUCHY’S INTEGRAL THEOREM 


Contour Integrals 


With differentiation under control, we turn to integration. The integral of a complex vari- 
able over a path in the complex plane (known as a contour) may be defined in close 
analogy to the (Riemann) integral of a real function integrated along the real x-axis. 

We divide the contour, from zo to zp, designated C, into n intervals by picking n — 1 
intermediate points z;, z2,... on the contour (Fig. 11.2). Consider the sum 


n 


Sn = > f (Sj/(Zj — Zj-1), 


j=l 
where ¢; is a point on the curve between z; and z;_. Now let n > oo with 
lege zj-al 0 


for all 7. If limy— oo S, exists, then 


j “ 
jim, fen -u-0= | f@dz= f toad. (11.16) 
i= z0 C 


The right-hand side of Eq. (11.16) is called the contour integral of f(z) (along the specified 
contour C from z = zo to z = Zz). 
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FIGURE 11.2 Integration path. 


As an alternative to the above, the contour integral may be defined by 


22 X2,92 
i f(@dz= / [u(x, y) +iv(x, y)][dx tidy] 
ZI X1,YI 
X2,92 2,92 
— / [u(x, y)dx — v(x, y)dy] +i / [u(x, y)dx +u(x, y)dy], (11.17) 
X1,Y1 X15Y1 


with the path joining (x1, y;) and (x2, y2) specified. This reduces the complex integral to 
the complex sum of real integrals. It is somewhat analogous to the replacement of a vector 
integral by the vector sum of scalar integrals. 

Often we are interested in contours that are closed, meaning that the start and end of the 
contour are at the same point, so that the contour forms a closed loop. We normally define 
the region enclosed by a contour as that which lies to the left when the contour is traversed 
in the indicated direction; thus a contour intended to surround a finite area will normally be 
deemed to be traversed in the counterclockwise direction. If the origin of a polar coordinate 
system is within the contour, this convention will cause the normal direction of travel on 
the contour to be that in which the polar angle 0 increases. 


Statement of Theorem 


Cauchy’s integral theorem states that: 


If f@ is an analytic function at all points of a simply connected region in the complex 
plane and if C is a closed contour within that region, then 


§ fidz=0. (11.18) 
Cc 
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To clarify the above, we need the following definition: 


e A region is simply connected if every closed curve within it can be shrunk continu- 
ously to a point that is within the region. 


In everyday language, a simply connected region is one that has no holes. We also need to 
explain that the symbol § will be used from now on to indicate an integral over a closed 
contour; a subscript (such as C) is attached when further specification of the contour is 
desired. Note also that for the theorem to apply, the contour must be “within” the region of 
analyticity. That means it cannot be on the boundary of the region. 

Before proving Cauchy’s integral theorem, we look at some examples that do (and do 
not) meet its conditions. 


Example 11.3.1 z” oN CiRCULAR CONTOUR 


Let’s examine the contour integral f,. z"dz, where C is a circle of radius r > 0 around the 
origin z = 0 in the positive mathematical sense (counterclockwise). In polar coordinates, 
cf, Eq. (1.125), we parameterize the circle as z = re!® and dz = ire!?d0. Forn € —1,n an 
integer, we then obtain 


20 
§ std: =i ptt f explitn + 1)6]dé 
C 0 
pinto |°™ 
ee ie ee es (11.19) 
EF + 5 
because 277 is a period of e!"+)?, However, for n = —1 
20 
§ S=i [ ao=27i, (11.20) 
Cc 0 


independent of r but nonzero. 

The fact that Eq. (11.19) is satisfied for all integers n > 0 is required by Cauchy’s the- 
orem, because for these n values z” is analytic for all finite z, and certainly for all points 
within a circle of radius r. Cauchy’s theorem does not apply for any negative integer n 
because, for these n, z” is singular at z = 0. The theorem therefore does not prescribe any 
particular values for the integrals of negative n. We see that one such integral (that for 
n = —1) has a nonzero value, and that others (for integral n 4 —1) do vanish. a 


Example 7171.3.2 — z” oN SQUARE CONTOUR 


We next examine the integration of z” for a different contour, a square with vertices at 
+5 + i . It is somewhat tedious to perform this integration for general integer n, so we 
illustrate only with n = 2 andn = —1. 
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FIGURE 11.3 Square integration contour. 


For n = 2, we have 2” = x? — y* + 2ixy. Referring to Fig. 11.3, we identify the con- 
tour as consisting of four line segments. On Segment 1, dz = dx (y= -5 and dy = 0); 
on Segment 2,dz=idy,x = 7 dx = 0; on Segment 3, dz=dx, y= 7 dy =0; and on 
Segment 4, dz=idy,x= —5, dx = 0. Note that for Segments 3 and 4 the integration is 
in the direction of decreasing value of the integration variable. These segments therefore 
contribute as follows to the integral: 














1 
2 
Se ment 1s f d(x? a pend : ; : Loe : 
e 4 =als 8 4, oe Be 
1 
~2 
1 
2 
S t2 fia ( 2 4.5) l 1 1 Lo) i 
men i Fae = => 
sa i ea ae 318 8 2 & 
1 
~2 





2 
Segment 3: f (dxy(x? — $ +is) = uf : tes lal 
e 4 = aig 8 4-3 & 
1 
2 





ee i iff 1 1 i 

Segment 4: | (idy)(q —y* —iy) = es: 3 3 SU a 
1 
2 


We find that the integral of z* over the square vanishes, just as it did over the circle. This 
is required by Cauchy’s theorem. 
For n = —1, we have, in Cartesian coordinates, 


1 x—Ty 


~ x2 4 y2’ 
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and the integral over the four segments of the square contour takes the form 


me 7 


2 
x+i/2 Da ay, x—i/2 stiy . 
ease f uae f Eat [ aay. 
xo +5 wera ee ee: 

2 2 2 


NI- 
LS) Tel 














SS iin 


NI 


Several of the terms vanish because they involve the integration of an odd integrand over 
an even interval, and others simply cancel. All that remains is 


[ota=i 


the same result as was obtained for the integration of z~! around a circle of any radius. 
Cauchy’s theorem does not apply here, so the nonzero result is not problematic. | 


1 


dx ; du [7 Wu : 
p= f =2i| ( )] =2z1, 
x243 uz+1 2 2 

=I 








aCe 














NI- 


Cauchy’s Theorem: Proof 


We now proceed to a proof of Cauchy’s integral theorem. The proof we offer is subject to 
a restriction originally accepted by Cauchy but later shown unnecessary by Goursat. What 
we need to show is that 


§ fladz=0, 
Cc 


subject to the requirement that C is a closed contour within a simply connected region R 
where f(z) is analytic. See Fig. 11.4. The restriction needed for Cauchy’s (and the present) 
proof is that if we write f(z) = u(x, y) + iv(x, y), the partial derivatives of u and v are 
continuous. 








Us 


FiGuURE 11.4 A closed-contour C within a simply connected region R. 
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We intend to prove the theorem by direct application of Stokes’ theorem (Section 3.8). 
Writing dz = dx +idy, 


§ flde= fut iviidx +idy) 
C 


Cc 


= $udx —vdy) +i p (wdx +udy). (11.21) 
(es Cc 


These two line integrals may be converted to surface integrals by Stokes’ theorem, a pro- 
cedure that is justified because we have assumed the partial derivatives to be continuous 
within the area enclosed by C. In applying Stokes’ theorem, note that the final two integrals 
of Eq. (11.21) are real. 

To proceed further, we note that all the integrals involved here can be identified as 
having integrands of the form (V;,é€, + Vyey) - dr, the integration is around a loop in the 
xy plane, and the value of the integral will be the surface integral, over the enclosed area, 
of the z component of V x (V,é, + V)é,). Thus, Stokes’ theorem says that 


OV, aVy 

Ved + Vody)= f — i lar ay, (11.22) 
° ox dy 

Cc A 





with A being the 2-D region enclosed by C. 
For the first integral in the second line of Eq. (11.21), let u = V, and v = -V,2 Then 


f wax _ vdy)= (Weds + Vy dy) 
Cc Cc 


Vy AV, i av a 
= — - dxdy=— —+— )dxdy. 11.23 
MG a) aay Ga oe 
A A 


For the second integral on the right side of Eq. (11.21) we let u = Vy and v = Vy. Using 
Stokes’ theorem again, we obtain 


foode+uay= f (B= 2) aray. (11.24) 
Cc A 





ox oy 


Inserting Eqs. (11.23) and (11.24) into Eq. (11.21), we now have 


av 8 au 8 

$ fide= ; sag ec dvay+i f (- =) avdy=o. (11.25) 
ox dy ox dy 

Cc A A 


Remembering that f(z) has been assumed analytic, we find that both the surface integrals 
in Eq. (11.25) are zero because application of the Cauchy-Riemann equations causes their 
integrands to vanish. This establishes the theorem. 





2For Stokes’ theorem, V; and Vy are any two functions with continuous partial derivatives, and they need not be connected by 
any relations stemming from complex variable theory. 
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Multiply Connected Regions 


The original statement of Cauchy’s integral theorem demanded a simply connected region 
of analyticity. This restriction may be relaxed by the creation of a barrier, a narrow region 
we choose to exclude from the region identified as analytic. The purpose of the barrier 
construction is to permit, within a multiply connected region, the identification of curves 
that can be shrunk to a point within the region, that is, the construction of a subregion that 
is simply connected. 

Consider the multiply connected region of Fig. 11.5, in which f(z) is only analytic in 
the unshaded area labeled R. Cauchy’s integral theorem is not valid for the contour C, 
as shown, but we can construct a contour C’ for which the theorem holds. We draw a 
barrier from the interior forbidden region, R’, to the forbidden region exterior to R and 
then run a new contour, C’, as shown in Fig. 11.6. 

The new contour, C’, through ABDEFGA, never crosses the barrier that converts R into 
a simply connected region. Incidentally, the three-dimensional analog of this technique 
was used in Section 3.9 to prove Gauss’ law. Because f(z) is in fact continuous across the 
barrier dividing DE from GA and the line segments DE and GA can be arbitrarily close 
together, we have 


A D 
[ t@a=- f roa. (11.26) 
G E 


v 
A 


U3 


R 


WD 


YU, 








FiGURE 11.5 A closed contour C in a multiply connected region. 








FiGURE 11.6 Conversion of a multiply connected region into a simply connected region. 
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Then, invoking Cauchy’s integral theorem, because the contour is now within a simply 
connected region, and using Eq. (11.26) to cancel the contributions of the segments along 
the barrier, 


$fade= f fode+ f tedz=o. (11.27) 
Cc’ 


ABD EFG 


Now that we have established Eq. (11.27), we note that A and D are only infinitesimally 
separated and that f(z) is actually continuous across the barrier. Hence, integration on the 
path ABD will yield the same result as a truly closed contour ABDA. Similar remarks apply 
to the path EFG, which can be replaced by EFGE. Renaming ABDA as C; and EFGE as 
—C4, we have the simple result 


§ fled: = § f (z)dz, (11.28) 
ci Ci 


in which C and C4 are both traversed in the same (counterclockwise, that is, positive) 
direction. 

This result calls for some interpretation. What we have shown is that the integral of an 
analytic function over a closed contour surrounding an “island” of nonanalyticity can be 
subjected to any continuous deformation within the region of analyticity without changing 
the value of the integral. The notion of continuous deformation means that the change 
in contour must be able to be carried out via a series of small steps, which precludes 
processes whereby we “jump over” a point or region of nonanalyticity. Since we already 
know that the integral of an analytic function over a contour in a simply connected region 
of analyticity has the value zero, we can make the more general statement 


The integral of an analytic function over a closed path has a value that remains 
unchanged over all possible continuous deformations of the contour within the region 
of analyticity. 


Looking back at the two examples of this section, we see that the integrals of z” vanished 
for both the circular and square contours, as prescribed by Cauchy’s integral theorem for 
an analytic function. The integrals of z~! did not vanish, and vanishing was not required 
because there was a point of nonanalyticity within the contours. However, the integrals of 
z_! for the two contours had the same value, as either contour can be reached by continuous 
deformation of the other. 

We close this section with an extremely important observation. By a trivial extension to 
Example 11.3.1 plus the fact that closed contours in a region of analyticity can be deformed 
continuously without altering the value of the integral, we have the valuable and useful 
result: 


The integral of (z — zo)" around any counterclockwise closed path C that encloses zo 
has, for any integer n, the values 


fic-ay"ac= {> 2 oa (11.29) 
Cc 


2mi, n=-—l. 





Exercises 


11.3.1 


11.3.2 


11.3.3 


11.3.4 


11.3.5 


11.3.6 


11.3.7 
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22: Z1 
Show that f f(aydz=— f f@dz 
Z1 £2 


Prove that <|flmax - L, 








[ fea: 
Cc 


where | f |max 1s the maximum value of | f(z)| along the contour C and L is the length 
of the contour. 
Show that the integral 

4-31 

/ (42? — 3iz)dz 

344i 
has the same value on the two paths: (a) the straight line connecting the integration 
limits, and (b) an arc on the circle |z| = 5. 
Let F(z) = 7, cos 2¢ de. 
m(1+i) 

Show that F(z) is independent of the path connecting the limits of integration, and 
evaluate F (zi). 


Evaluate fe (x? — iy”) dz, where the integration is (a) clockwise around the unit circle, 
(b) on a square with vertices at +1 +i. Explain why the results of parts (a) and (b) are 
or are not identical. 














Verify that 
1+i 
2" dz 
0 


depends on the path by evaluating the integral for the two paths shown in Fig. 11.7. 
Recall that f(z) = z* is not an analytic function of z and that Cauchy’s integral theorem 
therefore does not apply. 


Show that 





dz 
=0, 
box 
C 


in which the contour C is a circle defined by |z| = R > 1. 


Hint. Direct use of the Cauchy integral theorem is illegal. The integral may be evaluated 
by expanding into partial fractions and then treating the two terms individually. This 
yields 0 for R > 1 and 27i for R < 1. 
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FIGURE 11.7 Contours for Exercise 11.3.6. 


CAUCHY’S INTEGRAL FORMULA 


As in the preceding section, we consider a function f (z) that is analytic on a closed contour 
C and within the interior region bounded by C. This means that the contour C is to be 
traversed in the counterclockwise direction. We seek to prove the following result, known 
as Cauchy’s integral formula: 


1 f(z) 


2ni J z-Z 
C 





dz= f (Zo), (11.30) 


in which zo is any point in the interior region bounded by C. Note that since z is on the 
contour C while zo is in the interior, z — z9 # 0 and the integral Eq. (11.30) is well defined. 
Although f(z) is assumed analytic, the integrand is f(z)/(z — zo) and is not analytic at 
z= zo unless f(zo) = 0. We now deform the contour, to make it a circle of small radius 
r about z = zo, traversed, like the original contour, in the counterclockwise direction. As 
shown in the preceding section, this does not change the value of the integral. We therefore 
write z = zo + re!’, so dz = ire!’ d@, the integration is from 6 = 0 to @ = 2z, and 





20 
id 
AM gpa] SOE at ze, 
Z= 20 re! 
0 
Taking the limit r > 0, we obtain 
20 
g az=iftco) f d6 = 2nif (zo), (11.31) 
— 20 
Cc 0 


where we have replaced f(z) by its limit f(z) because it is analytic and therefore contin- 
uous at z = zo. This proves the Cauchy integral formula. 

Here is a remarkable result. The value of an analytic function f(z) is given at an arbitrary 
interior point z = zg once the values on the boundary C are specified. 
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It has been emphasized that zg is an interior point. What happens if zo is exterior to C? 
In this case the entire integrand is analytic on and within C. Cauchy’s integral theorem, 
Section 11.3, applies and the integral vanishes. Summarizing, we have 





1 f(z)dz _ (zo), Zo within the contour, 


20i Z— ZO 0, zo exterior to the contour. 
C 


Example 11.4.7. AN INTEGRAL 


Consider 





§ dz 

[= P 

z(z +2) 
Cc 


where the integration is counterclockwise over the unit circle. The factor 1/(z + 2) is 
analytic within the region enclosed by the contour, so this is a case of Cauchy’s integral 
formula, Eq. (11.30), with f(z) = 1/(z + 2) and zg = 0. The result is immediate: 


1 
1=2ni| =Ti. 
z+2 z=0 





Example 11.4.2 INTEGRAL WITH TWo SINGULAR FACTORS 


dz 
l= 
$a 


Cc 


Consider now 





also integrated counterclockwise over the unit circle. The denominator factors into 
4 (z = 5) (z + 5)> and it is apparent that the region of integration contains two singular fac- 
tors. However, we may still use Cauchy’s integral formula if we make the partial fraction 


expansion 
i ee 1 1 
422 —1 4 Z- 5 zt 5 , 


after which we integrate the two terms individually. We have 


1 dz dz 
saa as gape 
a 2 2 


Cc 








Each integral is a case of Cauchy’s formula with f(z) = 1, and for both integrals the 
point zo = +5 is within the contour, so each evaluates to 27ri, and their sum is zero. So 
I=0. | 
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Derivatives 
Cauchy’s integral formula may be used to obtain an expression for the derivative of f(z). 


Differentiating Eq. (11.30) with respect to zo, and interchanging the differentiation and the 
z integration,° 


1 f@) 
: = — @ ——J dz. 11.32 
fo) = 5 @-a (11.32) 
Differentiating again, 
2 f(z)dz 
/ 22 Se, Pie lee rd ey 
f° Go) = 2xi J (z—2z0)? 
Continuing, we get* 
! d 
FOR) = f§ 10% (11.33) 





oni (zZ- zo)ttl , 


that is, the requirement that f(z) be analytic guarantees not only a first derivative but 
derivatives of all orders as well! The derivatives of f(z) are automatically analytic. As 
indicated in a footnote, this statement assumes the Goursat version of the Cauchy integral 
theorem. This is a reason why Goursat’s contribution is so significant in the development 
of the theory of complex variables. 


Example 11.4.3 > Use oF DeRIvaTIVE FORMULA 


Consider 


sin? zdz 
Sy (z-a)*’ 
Cc 


where the integral is counterclockwise on a contour that encircles the point z = a. This is 
a case of Eq. (11.33) with n =3 and f(z) = sin’ z. Therefore, 


Qnif dad . 2 Ti . 8mi 
I = —]|-—, sin’ z = =| —8sinzeosz| = ——— sinacosa. 
3! | dz e 3 z=a 3 


3The interchange can be proved legitimate, but the proof requires that Cauchy’s integral theorem not be subject to the continuous 
derivative restriction in Cauchy’s original proof. We are therefore now depending on Goursat’s proof of the integral theorem. 
‘This expression is a starting point for defining derivatives of fractional order. See A. Erdelyi, ed., Tables of Integral Trans- 
forms, Vol. 2. New York: McGraw-Hill (1954). For more recent applications to mathematical analysis, see T. J. Osler, An inte- 
gral analogue of Taylor’s series and its use in computing Fourier transforms, Math. Comput. 26: 449 (1972), and references 
therein. 
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Morera’s Theorem 
A further application of Cauchy’s integral formula is in the proof of Morera’s theorem, 
which is the converse of Cauchy’s integral theorem. The theorem states the following: 


Ifa function f(z) is continuous in a simply connected region R and fc f (2)dz = 0 for 
every closed contour C within R, then f(z) is analytic throughout R. 


To prove the theorem, let us integrate f(z) from z; to z2. Since every closed-path inte- 
gral of f(z) vanishes, this integral is independent of path and depends only on its end- 
points. We may therefore write 


22 
F(a) — Fla) = / fd, (11.34) 
£1 


where F(z), presently unknown, can be called the indefinite integral of f(z). We then 
construct the identity 


F (22) — F(z1) 
Z2— Z1 


22 
1 
- f= —— | [Fo- ren] dt, (11.35) 
£2 = 2) 
Z1 


where we have introduced another complex variable, t. Next, using the fact that f(t) is 
continuous, we write, keeping only terms to first order in t — z1, 


fO=f@)=f GG =21) +s, 
which implies that 


£2 22 


[[Fo- Fen] a= [fent-a)+--)ar= 


Z1 Z1 


It is thus apparent that the right-hand side of Eq. (11.35) approaches zero in the limit 
Z2 — Z1, 80 
. F(2)-FE 
fens (22) — F@) _ 
22> 21 42> £1 


F'(z1). (11.36) 


Equation (11.36) shows that F(z), which by construction is single-valued, has a derivative 
at all points within R and is therefore analytic in that region. Since F(z) is analytic, then 
so also must be its derivative, f(z), thereby proving Morera’s theorem. 

At this point, one comment might be in order. Morera’s theorem, which establishes 
the analyticity of F(z) in a simply connected region, cannot be extended to prove that 
F(z), as well as f(z), is analytic throughout a multiply connected region via the device of 
introducing a barrier. It is not possible to show that F(z) will have the same value on both 
sides of the barrier, and in fact it does not always have that property. Thus, if extended 
to a multiply connected region, F(z) may fail to have the single-valuedness that is one 
of the requirements for analyticity. Put another way, a function which is analytic in a 
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multiply connected region will have analytic derivatives of all orders in that region, but its 
integral is not guaranteed to be analytic in the entire multiply connected region. This issue 
is elaborated in Section 11.6. 

The proof of Morera’s theorem has given us something additional, namely that the 
indefinite integral of f(z) is its antiderivative, showing that: 


The rules for integration of complex functions are the same as those for real functions. 


Further Applications 


An important application of Cauchy’s integral formula is the following Cauchy inequal- 
ity. If f(z) = >> anz” is analytic and bounded, | f(z)| < M ona circle of radius r about 
the origin, then 


|an|r" <M  (Cauchy’s inequality) (11.37) 


gives upper bounds for the coefficients of its Taylor expansion. To prove Eq. (11.37) let us 
define M(r) = maxj,)-; | f (z)| and use the Cauchy integral for an = f(z)/n!, 


| == 
ee oF 


2nr 
<M(r) = 


f (2) 


zntl dz 








lz|=r 


An immediate consequence of the inequality, Eq. (11.37), is Liouville’s theorem: If 
f(z) is analytic and bounded in the entire complex plane it is a constant. In fact, if 
| f(z)| <M for all z, then Cauchy’s inequality Eq. (11.37), applied for |z| =r, gives 
lan| < Mr~".. If now we choose to let r approach oo, we may conclude that for all n > 0, 
|an| =0. Hence f(z) = ao. 

Conversely, the slightest deviation of an analytic function from a constant value implies 
that there must be at least one singularity somewhere in the infinite complex plane. Apart 
from the trivial constant functions then, singularities are a fact of life, and we must learn to 
live with them. As pointed out when introducing the concept of the point at infinity, even 
innocuous functions such as f(z) = z have singularities at infinity; we now know that this 
is a property of every entire function that is not simply a constant. But we shall do more 
than just tolerate the existence of singularities. In the next section, we show how to expand 
a function in a Laurent series at a singularity, and we go on to use singularities to develop 
the powerful and useful calculus of residues in a later section of this chapter. 

A famous application of Liouville’s theorem yields the fundamental theorem of alge- 
bra (due to C. F. Gauss), which says that any polynomial P(z) = }*)\_, ayz” with n > 0 
and a, #0 has n roots. To prove this, suppose P(z) has no zero. Then 1/P(z) is analytic 
and bounded as |z| — oo, and, because of Liouville’s theorem, P(z) would have to be a 
constant. To resolve this contradiction, it must be the case that P(z) has at least one root 4 
that we can divide out, forming P(z)/(z—A), a polynomial of degree n — 1. We can repeat 
this process until the polynomial has been reduced to degree zero, thereby finding exactly 
n roots. 





Exercises 


11.4.1 


11.4.2 


11.4.3 


11.4.4 


11.4.5 


11.4.6 
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Unless explicitly stated otherwise, closed contours occurring in these exercises are to 
be understood as traversed in the mathematically positive (counterclockwise) direction. 


Show that 
— ph z"-""!dz, mandn integers 
2n1 


(with the contour encircling the origin once), is a representation of the Kronecker dyn. 


§ dz 
21’ 


Cc 


Evaluate 


where C is the circle |z — 1] = 1. 


Assuming that f(z) is analytic on and within a closed contour C and that the point zo 
is within C, show that 


/ 
IO) a f@ a 
Z— 20 (z — zo)? 
Cc 
You know that f(z) is analytic on and within a closed contour C. You suspect that the 


nth derivative f) (zg) is given by 
n} f (2) 


(n) = 
fi Go) = Qni J (e—zo)rth 
Cc 





Using mathematical induction (Section 1.4), prove that this expression is correct. 


(a) A function f(z) is analytic within a closed contour C (and continuous on C). If 
f(z) #0 within C and | f(z)| < M on C, show that 


lf@l SM 
for all points within C. 
Hint. Consider w(z) = 1/f (z). 


(b) If f(z) =0 within the contour C, show that the foregoing result does not hold 
and that it is possible to have | f (z)| = 0 at one or more points in the interior with 
| f (z)| > 0 over the entire bounding contour. Cite a specific example of an analytic 
function that behaves this way. 


Evaluate 
ez 
§ a 
Cc 
for the contour a square with sides of length a > 1, centered at z = 0. 





492 Chapter 11 Complex Variable Theory 


11.4.7 Evaluate 


sin’ z — 22 


G-o: dz, 


where the contour encircles the point z =a. 


11.4.8 Evaluate 
§ dz 
2(2z+ 1)’ 
Cc 


for the contour the unit circle. 


11.4.9 Evaluate 


§ f(@) _f@_ a, 
J z2z+1)2”” 


for the contour the unit circle. 


Hint. Make a partial fraction expansion. 


11.5 LAURENT EXPANSION 


Taylor Expansion 


The Cauchy integral formula of the preceding section opens up the way for another deriva- 

tion of Taylor’s series (Section 1.2), but this time for functions of a complex variable. 

Suppose we are trying to expand f(z) about z = zp and we have z = z) as the nearest 

point on the Argand diagram for which f(z) is not analytic. We construct a circle C cen- 

tered at z = zo with radius less than |z; — zo| (Fig. 11.8). Since z} was assumed to be the 

nearest point at which f(z) was not analytic, f(z) is necessarily analytic on and within C. 
From the Cauchy integral formula, Eq. (11.30), 











1 / d f 
(= 55 ELAS 
mi J g =z 
“4 f()dz' 
(z' — zo) — (z — Zo) 
u -$ fede’ (11.38) 
wi J (z/ — zo) [1 — (z — z0)/(z’ — z0)] 


Here z’ is a point on the contour C and z is any point interior to C. It is not legal yet 
to expand the denominator of the integrand in Eq. (11.38) by the binomial theorem, for 
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FIGURE 11.8 Circular domains for Taylor expansion. 


we have not yet proved the binomial theorem for complex variables. Instead, we note the 
identity 


1 


CO 
—_sleger er ies Y/Y, 11.39 
pap zitet P+ Pt x (11.39) 


n=0 


which may easily be verified by multiplying both sides by 1 — t. The infinite series, fol- 
lowing the methods of Section 1.2, is convergent for |t| < 1. 

Now, for a point z interior to C, |z — zo| < |z’ — zol, and, using Eq. (11.39), Eq. (11.38) 
becomes 





_ n Yd / 
f@=5 oe ay Ee (11.40) 


—zg)rtl 


Interchanging the order of integration and summation, which is valid because Eq. (11.39) 
is uniformly convergent for |t| < 1 — ¢, with 0 < « <1, we obtain 


A i — n fiz) dz’ 
f@= Bagi 2-70) aa (11.41) 
n= C 


z)itl 


Referring to Eq. (11.33), we get 





oO f(r) 
=e 0) ( — 20)", (11.42) 


n=0 


which is our desired Taylor expansion. 

It is important to note that our derivation not only produces the expansion given in 
Eq. (11.41); it also shows that this expansion converges when |z — zo| < |z1 — Zo|. For this 
reason the circle defined by |z — zo| = |z1 — Zo| is called the circle of convergence of our 
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Contour 
line 








FiGURE 11.9 Annular region for Laurent series. 
lz — ale; > le zale le —Zole, = lz — ol. 


Taylor series. Alternatively, the distance |z; — zo| is sometimes referred to as the radius of 
convergence of the Taylor series. In view of the earlier definition of z;, we can say that: 


The Taylor series of a function f(z) about any interior point zo of a region in which 
f (2) is analytic is a unique expansion that will have a radius of convergence equal to 
the distance from zg to the singularity of f(z) closest to zo, meaning that the Taylor 
series will converge within this circle of convergence. The Taylor series may or may 
not converge at individual points on the circle of convergence. 


From the Taylor expansion for f(z) a binomial theorem may be derived. That task is 
left to Exercise 11.5.2. 


Laurent Series 


We frequently encounter functions that are analytic in an annular region, say, between 
circles of inner radius r and outer radius R about a point zg, as shown in Fig. 11.9. We 
assume f(z) to be such a function, with z a typical point in the annular region. Draw- 
ing an imaginary barrier to convert our region into a simply connected region, we apply 
Cauchy’s integral formula to evaluate f(z), using the contour shown in the figure. Note 
that the contour consists of the two circles centered at zo, labeled C, and C2 (which can be 
considered closed since the barrier is fictitious), plus segments on either side of the barrier 
whose contributions will cancel. We assign Cz and C the radii rz and rj, respectively, 
where r <r2 <r, < R. Then, from Cauchy’s integral formula, 


f= fe wer ars (11.43) 
2n1 zi-—z 2 
Ci 





i Zi Zz 
C2 


Note that in Eq. (11.43)) an explicit minus sign has been introduced so that the contour 
C2 (like C;) is to be traversed in the positive (counterclockwise) sense. The treatment of 
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Eq. (11.43) now proceeds exactly like that of Eq. (11.38) in the development of the Taylor 
series. Each denominator is written as (z’ — zo) — (z — zo) and expanded by the binomial 
theorem, which is now regarded as proven (see Exercise 11.5.2). 

Noting that for Cy, |z’ — zo| > |z — Zo|, while for C2, |z’ — zo| < |z — zol, we find 


JO 


= zo)ttl 











f@= xe a 


= 


1 [o.@) 
+ re L@ zo)" XC —z)"! f(z’) dz’. 
(11.44) 


The minus sign of Eq. (11.43) has been absorbed by the binomial expansion. Labeling the 
first series S; and the second S2 we have 


e f(2)dz’ 
=e eee zo) $ eaayr aye (11.45) 


n=0 


which has the same form as the regular Taylor expansion, convergent for |z — zo| < |z’ — 
zo| = 11, that is, for all z interior to the larger circle, C,. For the second series in Eq. (6.65) 
we have 


S2= ani LS GF — — zo)" | f(z’) dz’, (11.46) 


n=1 


convergent for |z — zo| > |z’ — zo| =o, that is, for all z exterior to the smaller circle, C). 
Remember, C2 now goes counterclockwise. 
These two series are combined into one series,” known as a Laurent series, of the form 





f@O= DY) ang —z0)", (11.47) 
where 
_ i f(z/)dz' 
an = oni § (z/ — zo)ttl (11.48) 


Since convergence of a binomial expansion is not relevant to the evaluation of Eq. (11.48), 
C in that equation may be any contour within the annular region r < |z — zo| < R that 
encircles zy once in a counterclockwise sense. If such an annular region of analyticity does 
exist, then Eq. (11.47) is the Laurent series, or Laurent expansion, of f(z). 

The Laurent series differs from the Taylor series by the obvious feature of negative 
powers of (z — zo). For this reason the Laurent series will always diverge at least at z = zo 
and perhaps as far out as some distance r. In addition, note that Laurent series coefficients 
need not come from evaluation of contour integrals (which may be very intractable). Other 
techniques, such as ordinary series expansions, may provide the coefficients. 

Numerous examples of Laurent series appear later in this book. We limit ourselves here 
to one simple example to illustrate the application of Eq. (11.47). 


Replace n by —n in Sp and add. 
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Example 17.5.1 LAURENT EXPANSION 


Let f(z) =[z(z — 1)]~!. If we choose to make the Laurent expansion about zy = 0, then 
r >O and R < 1. These limitations arise because f(z) diverges both at z= 0 and z = 1. 
A partial fraction expansion, followed by the binomial expansion of (1 — z)~!, yields the 
Laurent series 


1 1 1 1 = 
= I-z-2----s— So e” (11.49) 


z(z—1)_ l1-z 2z Zz 








n=—-1 


From Eqs. (11.49), (11.47), and (11.48), we then have 
1 dz -1 forn>-1, 
§ : = illic (11.50) 
mi J (z/)"t2(z’ — 1) 0 forn<-—l, 
where the contour for Eq. (11.50) is counterclockwise in the annular region between z’ = 0 
and |z’| = 1. 


The integrals in Eq. (11.50) can also be directly evaluated by insertion of the geometric- 
series expansion of (1 — z’)~! 


Qn = 5 me De aes (11.51) 


Upon interchanging the order of summation and integration (permitted because the series 
is uniformly convergent), we have 


n= 55 Ly fe mn gel (11.52) 


m=0 








The integral in Eq. (11.52) (including the initial factor 1/277, but not the minus sign) was 
shown in Exercise 11.4.1 to be an integral representation of the Kronecker delta, and is 
therefore equal to 5,,n+1. The expression for a, then reduces to 


— A. 1S, 
tem Sent = 0. n<—l 
m=0 ; , 


in agreement with Eq. (11.50). a 


Exercises 


Develop the Taylor expansion of In(1 + z). 


oO git 
ANS. Sty 
Ns. Y\(-1) = 





11.5.2 


11.5.3 


11.5.4 


11.5.5 


11.5.6 
11.5.7 
11.5.8 


11.6 
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Derive the binomial expansion 


ie OO ey m zt 
1-2 paral 


for m, any real number. The expansion is convergent for |z| < 1. Why? 


A function f(z) is analytic on and within the unit circle. Also, | f(z)| < 1 for |z| < 1 
and f (0) = 0. Show that | f(z)| < |z| for |z| < 1. 


Hint. One approach is to show that f(z)/z is analytic and then to express [f(zo)/zo]” 
by the Cauchy integral formula. Finally, consider absolute magnitudes and take the nth 
root. This exercise is sometimes called Schwarz’s theorem. 


If f(z) is a real function of the complex variable z = x + iy, that is, f(x) = f*(x), and 
the Laurent expansion about the origin, f(z) = >> anz”, has a, = 0 for n < —N, show 
that all of the coefficients a, are real. 


Hint. Show that z% f (z) is analytic (via Morera’s theorem, Section 11.4). 


Prove that the Laurent expansion of a given function about a given point is unique; 
that is, if 


[o,@) 


FO= Yo an@—z)"= )> Pale — 20)", 


n=—N n=—N 
show that a, = b, for all n. 

Hint. Use the Cauchy integral formula. 

Obtain the Laurent expansion of e“/z* about z = 0. 
Obtain the Laurent expansion of ze*/(z — 1) about z= 1. 


Obtain the Laurent expansion of (z — 1) e!/ about z = 0. 


SINGULARITIES 


Poles 


We define a point zo as an isolated singular point of the function f(z) if f(z) is not 
analytic at z = Zo but is analytic at all neighboring points. There will therefore be a Laurent 
expansion about an isolated singular point, and one of the following statements will be true: 


1, 


2. 


The most negative power of z — zo in the Laurent expansion of f(z) about z = zo will 
be some finite power, (z — z9)~", where n is an integer, or 

The Laurent expansion of f(z) about z— zo will continue to negatively infinite powers 
of z — zo. 
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In the first case, the singularity is called a pole, and is more specifically identified as 
a pole of order n. A pole of order | is also called a simple pole. The second case is not 
referred to as a “pole of infinite order,” but is called an essential singularity. 

One way to identify a pole of f(z) without having available its Laurent expansion is to 
examine 


lim (z — zo)" f (Zo) 
2 > 20 


for various integers n. The smallest integer n for which this limit exists (i.e., is finite) gives 
the order of the pole at z = zo. This rule follows directly from the form of the Laurent 
expansion. 

Essential singularities are often identified directly from their Laurent expansions. For 
example, 


clearly has an essential singularity at z = 0. Essential singularities have many pathologi- 
cal features. For instance, we can show that in any small neighborhood of an essential 
singularity of f(z) the function f(z) comes arbitrarily close to any (and therefore every) 
preselected complex quantity wo.° Here, the entire w-plane is mapped by f into the neigh- 
borhood of the point zo. 

The behavior of f(z) as z > o0 is defined in terms of the behavior of f(1/t) as t > 0. 
Consider the function 


y (-1)%z2"41 
sinz = ———.. (11.53) 
! 
ar (2n + 1)! 
As Z — 00, we replace the z by 1/t to obtain 
_(1\ ma Ci 
sin( 2) =o (11.54) 
n=0 


It is clear that sin(1/t) has an essential singularity at t = 0, from which we conclude that 
sin z has an essential singularity at z = oo. Note that although the absolute value of sin x 
for all real x is equal to or less than unity, the absolute value of siniy =i sinh y increases 
exponentially without limit as y increases. 

A function that is analytic throughout the finite complex plane except for isolated poles 
is called meromorphic. Examples are ratios of two polynomials, also tanz and cotz. As 
previously mentioned, functions that have no singularities in the finite complex plane are 
called entire functions. Examples are exp z, sinz, and cos z. 


This theorem is due to Picard. A proof is given by E. C. Titchmarsh, The Theory of Functions, 2nd ed. New York: Oxford 
University Press (1939). 
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Branch Points 


In addition to the isolated singularities identified as poles or essential singularities, there 
are singularities uniquely associated with multivalued functions. It is useful to work with 
these functions in ways that to the maximum possible extent remove ambiguity as to the 
function values. Thus, if at a point zo (at which f(z) has a derivative) we have chosen a 
specific value of the multivalued function f(z), then we can assign to f(z) values at nearby 
points in a way that causes continuity in f(z). If we think of a succession of closely spaced 
points as in the limit of zero spacing defining a path, our current observation is that a given 
value of f(zo) then leads to a unique definition of the value of f(z) to be assigned to 
each point on the path. This scheme creates no ambiguity so long as the path is entirely 
open, meaning that the path does not return to any point previously passed. But if the path 
returns to zg, thereby forming a closed loop, our prescription might lead, upon the return, 
to a different one of the multiple values of f (zo). 


Example 11.6.1. VatuEoF z!/2 oNA CLOSED Loop 


We consider f(z) = z!/? on the path consisting of counterclockwise passage around the 
unit circle, starting and ending at z = +1. At the start point, where z!/? has the multiple 
values +1 and —1, let us choose f(z) = +1. See Fig. 11.10. Writing f(z) = e!*/*, we 
note that this form (with 6 = 0) is consistent with the desired starting value of f(z), +1. 
In the figure, the start point is labeled A. Next, we note that passage counterclockwise on 
the unit circle corresponds to an increase in 0, so that at the points marked B, C, and D in 
the figure, the respective values of 6 are 2/2, 2, and 32/2. Note that because of the path 
we have decided to take, we cannot assign to point C the 9 value —z or to point D the 6 
value —z/2. Continuing further along the path, when we return to point A the value of 6 
has become 27 (not zero). 

Now that we have identified the behavior of 0, let’s examine what happens to f(z). At 
the points B, C, and D, we have 


: : 1+i 
(z ) ela /2 eit/4 paket aac 
ee J2 





’ 


Flec) =e"? = Fi, 
-l+i 
v2 


fap=""* = 











FiGurE 11.10 Path encircling z = 0 for evaluation of z!/*. 
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FiGurE 11.11 Path not encircling z = 0 for evaluation of z!/*. 


When we return to point A, we have f(+1) = e'* = —1, which is the other value of the 
multivalued function z!/?. 

If we continue for a second counterclockwise circuit of the unit circle, the value of 6 
would continue to increase, from 27 to 42 (reached when we arrive at point A after the 
second loop). We now have f (+1) = e4*)/? = e?”! = 1, so a second circuit has brought 
us back to the original value. It should now be clear that we are only going to be able to 
obtain two different values of z!/* for the same point z. | 


Example 71.6.2 = ANOTHER CLOSED Loop 


Let’s now see what happens to the function z!/* as we pass counterclockwise around a 


circle of unit radius centered at z = +2, starting and ending at z = +3. See Fig. 11.11. 
At z = 3, the values of f(z) are +/3 and —/3; let’s start with f@A) = +,/3. As we 
move from point A through point B to point C, note from the figure that the value of 0 
first increases (actually, to 30°) and then decreases again to zero; further passage from C 
to D and back to A causes 0 first to decrease (to —30°) and then to return to zero at A. So 
in this example the closed loop does not bring us to a different value of the multivalued 
function z!/?. | 


The essential difference between these two examples is that in the first, the path encircled 
z = 0; in the second it did not. What is special about z = 0 is that (from a complex-variable 
viewpoint) it is singular; the function z'/* does not have a derivative there. The lack of a 
well-defined derivative means that ambiguity in the function value will result from paths 
that circle such a singular point, which we call a branch point. The order of a branch 
point is defined as the number of paths around it that must be taken before the function 
involved returns to its original value; in the case of z!/?, we saw that the branch point at 
z =0 1s of order 2. 

We are now ready to see what must be done to cause a multivalued function to be 
restricted to single-valuedness on a portion of the complex plane. We simply need to pre- 
vent its evaluation on paths that encircle a branch point. We do so by drawing a line (known 
as a branch line, or more commonly, a branch cut) that the evaluation path cannot cross; 
the branch cut must start from our branch point and continue to infinity (or if consistent 
with maintaining single-valuedness) to another finite branch point. The precise path of a 
branch cut can be chosen freely; what must be chosen appropriately are its endpoints. 

Once appropriate branch cut(s) have been drawn, the originally multivalued function has 
been restricted to being single-valued in the region bounded by the branch cut(s); we call 
the function as made single-valued in this way a branch of our original function. Since we 
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could construct such a branch starting from any one of the values of the original function 
at a single arbitrary point in our region, we identify our multivalued function as having 
multiple branches. In the case of z!/?, which is double-valued, the number of branches 
is two. 

Note that a function with a branch point and a corresponding branch cut will not be 
continuous across the cut line. Hence line integrals in opposite directions on the two sides 
of the branch cut will not generally cancel each other. Branch cuts, therefore, are real 
boundaries to a region of analyticity, in contrast to the artificial barriers we introduced in 
extending Cauchy’s integral theorem to multiply connected regions. 

While from a fundamental viewpoint all branches of a multivalued function f(z) are 
equally legitimate, it is often convenient to agree on the branch to be used, and such a 
branch is sometimes called the principal branch, with the value of f(z) on that branch 
called its principal value. It is common to take the branch of z!/? which is positive for 
real, positive z as its principal branch. 

An observation that is important for complex analysis is that by drawing appropriate 
branch cut(s), we have restricted a multivalued function to single-valuedness, so that it 
can be an analytic function within the region bounded by the branch cut(s), and we can 
therefore apply Cauchy’s two theorems to contour integrals within the region of analyticity. 


Example 1 1.6.3 In z HAS AN INFINITE NUMBER OF BRANCHES 


Here we examine the singularity structure of Inz. As we already saw in Eq. (1.138), the 
logarithm is multivalued, with the polar representation 


ie =tH Ges) =Inr +i(@ +2nz), (11.55) 


where n can have any positive or negative integer value. 

Noting that Inz is singular at z = 0 (it has no derivative there), we now identify z = 0 
as a branch point. Let’s consider what happens if we encircle it by a counterclockwise 
path on a circle of radius r, starting from the initial value Inr, at z =r =re!® with 6 =0. 
Every passage around the circle will add 27 to 6, and after n complete circuits the value 
we have for Inz will be Inr + 2nzi. The branch point of Inz at z = 0 is of infinite order, 
corresponding to the infinite number of its multiple values. (By encircling z = 0 repeatedly 
in the clockwise direction, we can also reach all negative integer values of n.) 

We can make Inz single-valued by drawing a branch cut from z = 0 to z = oo in any 
way (though there is ordinarily no reason to use cuts that are not straight lines). It is typical 
to identify the branch with n = 0 as the principal branch of the logarithm. Incidentally, we 
note that the inverse trigonometric functions, which can be written in terms of logarithms, 
as in Eq. (1.137), will also be infinitely multivalued, with principal values that are usually 
chosen on a branch that will yield real values for real z. Compare with the usual choices of 
the values assigned the real-variable forms of sin~! x = arcsin x, ete. | 


Using the logarithm, we are now in a position to look at the singularity structures of 
expressions of the form z?, where both z and p may be complex. To do so, we write 


z=el™ go zP =eP nz, (11.56) 


which is single-valued if p is an integer, t-valued if p is a real rational fraction (in lowest 
terms) of the form s/t, and infinitely multivalued otherwise. 
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Example 11.6.4 = MULTIPLE BRANCH POINTS 


Consider the function 
f@a(?-D=(4)'?2-D'”. 


The first factor on the right-hand side, (z+ 1)!/7, has a branch point at z = —1. The second 
factor has a branch point at z= +1. At infinity f(z) has a simple pole. This is best seen 
by substituting z = 1/t and making a binomial expansion at t = 0: 


2 2_ | 2y1/2 1S (1/2 nan. tl 1g 

ea = er)" =->( : Jen a Pr ae 
n=0 

We want to make f(z) single-valued by making appropriate branch cut(s). There are many 

ways to accomplish this, but one we wish to investigate is the possibility of making a 

branch cut from z = —1 to z= +1, as shown in Fig. 11.12. 

To determine whether this branch cut makes our f(z) single-valued, we need to see what 
happens to each of the multivalent factors in f(z) as we move around on its Argand dia- 
gram. Figure 11.12 also identifies the quantities that are relevant for this purpose, namely 
those that relate a point P to the branch points. In particular, we have written the position 
relative to the branch point at z = 1 as z — 1 = pe!®, with the position relative to z = —1 
denoted z + 1 = re’’. With these definitions, we have 


f= 7/2 pl/2Q0+9)/2, 


Our mission is to note how g and 6 change as we move along the path, so that we can use 
the correct value of each for evaluating f(z). 

We consider a closed path starting at point A in Fig. 11.13, proceeding via points B 
through F,, then back to A. At the start point, we choose 6 = gy = 0, thereby causing the 
multivalued f(z,) to have the specific value +./3. As we pass above z = +1 on the way 
to point B, 6 remains essentially zero, but gy increases from zero to 2. These angles do not 
change as we pass from B to C, but on going to point D, 6 increases to 7, and then, passing 
below z = —1 on the way to point E, it further increases to 27 (not zero!). Meanwhile, g 
remains essentially at 7. Finally, returning to point A below z = +1, ¢ increases to 277, 
so that upon the return to point A both g and 6 have become 27. The behavior of these 
angles and the values of (9 + g)/2 (the argument of f(z)) are tabulated in Table 11.1. 








FIGURE 11.12 Possible branch cut for Example 11.6.4 and the quantities relating 
a point P to the branch points. 
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FiGURE 11.13 Path around the branch cut in Example 11.6.4. 


Table 11.1 Phase Angles, Path in Fig. 11.13 








Point 6 Q (0+ @)/2 
A 0 0 0 

B 0 a m/2 

Cc 0 u m/2 

D WU nu 54 

E 20 Tv m/2 

F 20 TU 37/2 
A 20 20 20 





Two features emerge from this analysis: 


1. The phase of f(z) at points B and C is not the same as that at points E and F.. This 
behavior can be expected at a branch cut. 

2. The phase of f(z) at point A’ (the return to A) exceeds that at point A by 27, meaning 
that the function f(z) = (z?— 1)!/? is single-valued for the contour shown, encircling 
both branch points. 


What actually happened is that each of the two multivalued factors contributed a sign 
change upon passage around the closed loop, so the two factors together restored the origi- 
nal sign of f(z). 

Another way we could have made f(z) single-valued would have been to make a sepa- 
rate branch cut from each branch point to infinity; a reasonable way to do this would be to 
make cuts on the real axis for all x > 1 and for all x < —1. This alternative is explored in 
Exercises 11.6.2 and 11.6.4. | 


Analytic Continuation 


We saw in Section 11.5 that a function f(z) which is analytic within a region can be 
uniquely expanded in a Taylor series about any interior point zg of the region of analyti- 
city, and that the resulting expansion will be convergent within a circle of convergence 
extending to the singularity of f(z) closest to zo. Since 
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e The coefficients in the Taylor series are proportional to the derivatives of f(z), 


e An analytic function has derivatives of all orders that are independent of direction, and 
therefore 


e The values of f(z) ona single finite line segment with zo as a interior point will suffice 
to determine all derivatives of f(z) at z = Zo, 


we conclude that if two apparently different analytic functions (e.g., a closed expression 
vs. an integral representation or a power series) have values that coincide on a range as 
restricted as a single finite line segment, then they are actually the same function within 
the region where both functional forms are defined. 

The above conclusion will provide us with a technique for extending the definition of an 
analytic function beyond the range of any particular functional form initially used to define 
it. All we will need to do is to find another functional form whose range of definition is not 
entirely included in that of the initial form and which yields the same function values on at 
least a finite line segment within the area where both functional forms are defined. 

To make the approach more concrete, consider the situation illustrated in Fig. 11.14, 
where a function f(z) is defined by its Taylor expansion about a point zo with a circle 
of convergence Co defined by the singularity nearest to zg, labeled z;. If we now make 
a Taylor expansion about some point z; within Co (which we can do because f(z) has 
known values in the neighborhood of z;), this new expansion may have a circle of con- 
vergence C| that is not entirely within Co, thereby defining a function that is analytic in 
the region that is the union of C; and C2. Note that if we need to obtain actual values of 
f(z) for z within the intersection of Co and C, we may use either Taylor expansion, but 
in the region within only one circle we must use the expansion that is valid there (the other 
expansion will not converge). A generalization of the above analysis leads to the beautiful 
and valuable result that if two analytic functions coincide in any region, or even on any 
finite line segment, they are the same function, and therefore defined over the entire range 
of both function definitions. 

After Weierstrass this process of enlarging the region in which we have the specification 
of an analytic function is called analytic continuation, and the process may be carried out 
repeatedly to maximize the region in which the function is defined. Consider the situation 
pictured in Fig. 11.15, where the only singularity of f(z) is at z; and f(z) is originally 
defined by its Taylor expansion about zo, with circle of convergence Co. By making ana- 
lytic continuations as shown by the series of circles C,, ..., we can cover the entire annular 
region of analyticity shown in the figure, and can use the original Taylor series to generate 
new expansions that apply to regions within the other circles. 





FIGURE 11.14 Analytic continuation. One step. 
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‘easy 


FIGURE 11.15 Analytic continuation. Many steps. 


ON 


A, 








FIGURE 11.16 Radii of convergence of power-series expansions for Example 11.6.5. 


Example 11.6.5. — ANALYTIC CONTINUATION 


Consider these two power-series expansions: 


A@= >) -)"@-b", (11.57) 
n=0 

pa= > P 'E=0", (11.58) 
n=0 


Each has a unit radius of convergence; the circles of convergence overlap, as can be seen 
from Fig. 11.16. 

To determine whether these expansions represent the same analytic function in over- 
lapping domains, we can check to see if f(z) = f2(z) for at least a line segment in the 
region of overlap. A suitable line is the diagonal that connects the origin with 1 + 7, pass- 
ing through the intermediate point (1 + 7)/2. Setting z = (a + 5) +i) (chosen to make 
a = 0 an interior point of the overlap region), we expand f; and f> about a = 0 to find out 
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whether their power series coincide. Initially we have (as functions of a) 


(oe) l—i n 
fi=> C0)" la + i)o — >| 


n=0 
le) 1—-i7" 
i— 
— 1 —_|. 
fr PB, [ +i)a+ 5 4 


Applying the binomial theorem to obtain power series in a, and interchanging the order of 
the two sums, 





= =3 
fz Ne warn ()(4 “) ; 
0° 0° _a\n-j 
f= lita tial Se) (- 5 -) 
n=j 


j=0 








Ejcwewe P(E” 


To proceed further we need to evaluate the summations over n. Referring to Exercise 1.3.5, 
where it was shown that 


[oe 


Erma 
og (ea 


n=] 


we get 


= ff 2 \Ith © (yy atl 
A= vid +iiel (5) ie 


i=0 ja 1+i 


| caf BN Sag 
a ava aie (2 = 
f= i Ci d-ile (5) gras aon 


j=0 


confirming that f; and f2 are the same analytic function, now defined over the union of 
the two circles in Fig. 11.16. 

Incidentally, both f| and fo are expansions of 1/z (about the respective points | and /), 
so 1/z could also be regarded as an analytic continuation of f|, f2, or both to the entire 
complex plane except the singular point at z = 0. The expansion in powers of a is also 
a representation of 1/z, but its range of validity is only a circle of radius 1//2 about 
(1 + 7)/2 and it does not analytically continue f(z) outside the union of C; andCz. 


The use of power series is not the only mechanism for carrying out analytic continu- 
ations; an alternative and powerful method is the use of functional relations, which are 
formulas that relate values of the same analytic function f(z) at different z. As an exam- 
ple of a functional relation, the integral representation of the gamma function, given in 
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Table 1.2, can be manipulated (see Chapter 13) to show that '(z + 1) = zI'(z), consis- 
tent with the elementary result that n! = n(n — 1)!. This functional relation can be used 
to analytically continue I'(z) to values of z for which the integral representation does not 











converge. 
Exercises 
11.6.1. As an example of an essential singularity consider e!/* as z approaches zero. For any 
complex number Zo, zo 4 0, show that 
ell? = x 
has an infinite number of solutions. 
11.6.2 Show that the function 
w(z) = (z* — 1)! 
is single-valued if we make branch cuts on the real axis for x > 1 and for x < —1. 
11.6.3 A function f(z) can be represented by 
fi) 
f= ; 
f(z) 
in which f;(z) and fo(z) are analytic. The denominator, f2(z), vanishes at z = zo, 
showing that f(z) has a pole at z = zo. However, fi (zo) #0, f3(zo) # 0. Show that 
a_1, the coefficient of (z — zo)~! in a Laurent expansion of f(z) at z = zg, is given by 
= fi Go) 
~ f3(Z0)” 
11.6.4 Determine a unique branch for the function of Exercise 11.6.2 that will cause the value 
it yields for f(i) to be the same as that found for f(i) in Example 11.6.4. Although 
Exercise 11.6.2 and Example 11.6.4 describe the same multivalued function, the specific 
values assigned for various z will not agree everywhere, due to the difference in the 
location of the branch cuts. Identify the portions of the complex plane where both these 
descriptions do and do not agree, and characterize the differences. 
11.6.5 Find all singularities of 
-1/3 1/2 
z + +(Z-2)'", 
@—3)3 ( ) 
and identify their types (e.g., second-order branch point, fifth-order pole, ...). Include 
any singularities at the point at infinity. 
Note. A branch point is of nth order if it requires n, but no fewer, circuits around the 
point to restore the original value. 
11.6.6 The function F(z) = In(z* + 1) is made single-valued by straight-line branch cuts 


from (x, y) = (0, —1) to (—oo, —1) and from (0,+1) to (0, +00). See Fig. 11.17. If 
F (0) = —2z71, find the value of F(i — 2). 
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11.6.7 


11.6.8 


11.6.9 


11.6.10 


11.6.11 








FiGURE 11.17 Branch cuts for Exercise 11.6.6. 


Show that negative numbers have logarithms in the complex plane. In particular, find 
In(—1). 


ANS. In(—1)=iz. 


For noninteger m, show that the binomial expansion of Exercise 11.5.2 holds only for a 
suitably defined branch of the function (1 + z)”. Show how the z-plane is cut. Explain 
why |z| < 1 may be taken as the circle of convergence for the expansion of this branch, 
in light of the cut you have chosen. 


The Taylor expansion of Exercises 11.5.2 and 11.6.8 is not suitable for branches other 
than the one suitably defined branch of the function (1 + z)” for noninteger m. (Note 
that other branches cannot have the same Taylor expansion since they must be distin- 
guishable.) Using the same branch cut of the earlier exercises for all other branches, 
find the corresponding Taylor expansions, detailing the phase assignments and Taylor 
coefficients. 


(a) Develop a Laurent expansion of f(z) =[z(z — 1)]~! about the point z = 1 valid 
for small values of |z — 1|. Specify the exact range over which your expansion 
holds. This is an analytic continuation of the infinite series in Eq. (11.49). 

(b) Determine the Laurent expansion of f(z) about z = 1 but for |z — 1| large. 


Hint. Make a partial fraction decomposition of this function and use the geometric 
series. 


(a) Given fi (z) = ih e *'dt (with t real), show that the domain in which f;(z) exists 
(and is analytic) is Re(z) > 0. 

(b) Show that fo(z) = 1/z equals fi (z) over Re(z) > 0 and is therefore an analytic 
continuation of f;(z) over the entire z-plane except for z = 0. 

(c) Expand 1/z about the point z = —i. You will have 


A@ =o ane+iy". 


n=0 
What is the domain of this formula for f3(z)? 
[oe 


1 
ANS. - =i) li*etiy’, lz+i| <1. 
n=0 
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CALCULUS OF RESIDUES 


Residue Theorem 


If the Laurent expansion of a function, 


CO 


fO= D> anle— x)", 


n=—-C} 


is integrated term by term by using a closed contour that encircles one isolated singular 
point zo once in a counterclockwise sense, we obtain, applying Eq. (11.29), 


an § (2 = 20)"dz =O, n#-l1. (11.59) 
However, for n = —1, Eq. (11.29) yields 
a} $c — zo) !dz = 2mnia_y. (11.60) 
Summarizing Eqs. (11.59) and (11.60), we have 
$ flede=2aia1, (11.61) 


The constant a_;, the coefficient of (z — zo)~! in the Laurent expansion, is called the 
residue of f(z) at z= Zo. 

Now consider the evaluation of the integral, over a closed contour C, of a function that 
has isolated singularities at points z1, z2,.... We can handle this integral by deforming our 
contour as shown in Fig. 11.18. Cauchy’s integral theorem (Section 11.3) then leads to 


$ fade+ f fadet $ fodz+--=0, (11.62) 
Cc Ci C2 








FIGURE 11.18 Excluding isolated singularities. 
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where C is in the positive, counterclockwise direction, but the contours C;, C2,..., that, 
respectively, encircle z,, z2,... are all clockwise. Thus, referring to Eq. (11.61), the inte- 
grals C; about the individual isolated singularities have the values 


§ f(e)de=— Inia, (11.63) 
Ci 


where a_j,; is the residue obtained from the Laurent expansion about the singular point 
z =z;. The negative sign comes from the clockwise integration. Combining Eqs. (11.62) 
and (11.63), we have 


§ fede 2rilans+anat) 
Cc 


= 277i (sum of the enclosed residues). (11.64) 


This is the residue theorem. The problem of evaluating a set of contour integrals is 
replaced by the algebraic problem of computing residues at the enclosed singular points. 


Computing Residues 


It is, of course, not necessary to obtain an entire Laurent expansion of f(z) about z = zo 
to identify a_,, the coefficient of (z — zo)~! in the expansion. If f(z) has a simple pole at 
z — zo, then, with a, the coefficients in the expansion of f(z), 


(z — 20) f(z) =a_1 + ag(z — 20) taiz—zo)* +°*, (11.65) 


and, recognizing that (z — zo) f (z) may not have a form permitting an obvious cancellation 
of the factor z — zo, we take the limit of Eq. (11.65) as z > Zo: 


a_; = lim ( = zo) f@). (11.66) 
Z-> 20 
If there is a pole of order n > 1 at z — zo, then (z — zo)” f(z) must have the expansion 


(z — 20)" f (z) =a_n +++» +4_1(z — 29)" | +.ag(z — 20)" +°°°. (11.67) 


We see that a_ is the coefficient of (z — zo)”~! in the Taylor expansion of (z — zo)" f(z), 
and therefore we can identify it as satisfying 


1 ; q’-! : 
t= Gee Dl tim | (« z0) r@) | ee 


where a limit is indicated to take account of the fact that the expression involved may 
be indeterminate. Sometimes the general formula, Eq. (11.68), is found to be more com- 
plicated than the judicious use of power-series expansions. See items 4 and 5 in Exam- 
ple 11.7.1 below. 

Essential singularities will also have well-defined residues, but finding them may be 
more difficult. In principle, one can use Eq. (11.48) with n = —1, but the integral involved 
may seem intractable. Sometimes the easiest route to the residue is by first finding the 
Laurent expansion. 
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Example 11.7.1 COMPUTING RESIDUES 


Here are some examples: 
; 1 —_ leg aur 1 
1. Residue of 75 at z= —j is pm 2ol mi)l=a 


2. Residue of + at z=0 is lim, .o (5) =1, 


sin Zz z 


3. Residue of ye pabe=2e" is 











’ 


; (z — 2e™!) Inz (n2+zai) aw iln2 
lim = - = 
24+4 Ai 4 4 


z—>2e™ 


4. Residue of az at z=; the pole is second order, and the residue is given by 


1 |. d z7(z—-7) 
lim ? 
Izon\dz sin? z 


However, it may be easier to make the substitution w = z — 7, to note that sin? z = 
sin’ w, and to identify the residue as the coefficient of 1/w in the expansion of (w + 
m)/ sin? w about w = 0. This expansion can be written 





w+ w+ 





The denominator expands entirely into even powers of w, so the mz in the numerator 
cannot contribute to the residue. Then, from the w in the numerator and the leading 
term of the denominator, we find the residue to be 1. 


5. Residue of f(z) = ath at z=0. 


The pole at z = 0 is second-order, and direct application of Eq. (11.48) leads to a 
complicated indeterminate expression requiring multiple applications of 1’H6pital’s 
rule. Perhaps easier is to introduce the initial terms of the expansions about z = 0: 
cotwz = (1z)~! + O(z), 1/(2 +2) = $[1 — (z/2) + O(z7)], reaching 


oie: 1 z ; 
{@M= a |=+ 0 (5) [1- 7+ OG )], 


from which we can read out the residue as the coefficient of z~!, namely —1/4z. 





6. Residue of e~!/% at z= 0. This is at an essential singularity; from the Taylor series of 
e” with w = —1/z, we have 


tf. 4 1\7 
ea a fe ete 
e -+3( -) ace 


from which we read out the value of the residue, —1. 
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Cauchy Principal Value 


Occasionally an isolated pole will be directly on the contour of an integration, causing the 
integral to diverge. A simple example is provided by an attempt to evaluate the real integral 


b 
dx 
= 
—wa 


(11.69) 


which is divergent because of the logarithmic singularity at x = 0; note that the indefinite 
integral of x~! is Inx. However, the integral in Eq. (11.69) can be given a meaning if we 
obtain a convergent form when replaced by a limit of the form 


—3 b 
: dx dx 
lim | —+] —. (11.70) 
6—0t x x 
—a 8 

To avoid issues with the logarithm of negative values of x, we change the variable in 
the first integral to y = —x, and the two integrals are then seen to have the respective 
values Ind — Ina and Inb — Iné, with sum Inb — Ina. What has happened is that the 
increase toward +00 as 1/x approaches zero from positive values of x is compensated by 
a decrease toward —oo as 1/x approaches zero from negative x. This situation is illustrated 
graphically in Fig. 11.19. 

Note that the procedure we have described does not make the original integral of 
Eq. (11.69) convergent. In order for that integral to be convergent, it would be necessary 











FIGURE 11.19 Cauchy principal value cancellation, integral of 1/z. 
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that 


—b, b 
. / dx / dx 
lim —+ ]/— 
51,62 0+ Xx x 
=a 52 


exist (meaning that the limit has a unique value) when 4, and 452 approach zero indepen- 
dently. However, different rates of approach to zero by 6; and 52 will cause a change in 
value of the integral. For example, if 52 = 26), then an evaluation like that of Eq. (11.70) 
would yield the result (Ind; — Ina) + (nb — Indé2) = Inb — Ina — In2. The limit then has 
no definite value, confirming our original statement that the integral diverges. 

Generalizing from the above example, we define the Cauchy principal value of the real 
integral of a function f(x) with an isolated singularity on the integration path at the point 
xo as the limit 


6 


xo- 
lim, / f(x)dx+ / f(x) dx. (11.71) 


xot+é6 


The Cauchy principal value is sometimes indicated by preceding the integral sign by P or 
by drawing a horizontal line through the integration sign, as in 


Pf feds or f teoar. 


This notation, of course, presumes that the location of the singularity is known. 


Example 77.7.2 A CAUCHY PRINCIPAL VALUE 


Consider the integral 


CO 
sin. x 
raf an. (11.72) 
x 
0 


ix _ eT ix 
sinx = —————_.,, 
2i 
we then have 
oo, ; 
elX — eWix 
r= [ Sas. (11.73) 

2ix 

0 


We would like to separate this expression for J into two terms, but if we do so, each will 
become a logarithmically divergent integral. However, if we change the integration range 
in Eq. (11.72), originally (0, oo), to (5, 00), that integral remains unchanged in the limit 
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of small 5, and the integrals in Eq. (11.73) remain convergent so long as 6 is not precisely 
zero. Then, rewriting the second of the two integrals in Eq. (11.73), to reach 


oo —6 


e7ix ex 

i —dx= ; = dx, 
2ix 2ix 

é 


—oo 





we see that the two integrals which together form / can be written (in the limit 5 > 0*) as 
the Cauchy principal value integral 


[oe 


ex 
T= — dx. (11.74) 
2ix 





—0o 
a 


The Cauchy principal value has implications for complex variable theory. Suppose now 
that, instead of having a break in the integration path from x9 — 6 to x9 + 5, we connect 
the two parts of the path by a circular arc passing, in the complex plane, either above or 
below the singularity at x9. Let’s continue the discussion in conventional complex-variable 
notation, denoting the singular point as zg, so our arc will be a half circle (of radius 5) 
passing either counterclockwise below the singularity at zo or clockwise above zo. We 
restrict further analysis to singularities no stronger than 1/(z — zo), so we are dealing with 
a simple pole. Looking at the Laurent expansion of the function f(z) to be integrated, 
it will have initial terms 





l 
Fayre, 
E20 


and the integration over a semicircle of radius 6 will take (in the limit 5 > 0*) one of the 
two forms (in the polar representation z — zo = re'®, with dz = ire'’d0 and r = 5): 





0 0 
Lover = [oo ide!? [aie + dog + is | = [lea + ide! ag +... -) dé > —ima_}, 
e 
1 1 
(11.75) 
20 20 
Tunder = [asise Fe +ao+-:: | = [len 4+ ide! ag +:- -) d0 > ima_\. 
e 
XT 1 
(11.76) 


Note that all but the first term of each of Eqs. (11.75) and (11.76) vanishes in the limit 
5 — O*, and that each of these equations yields a result that is in magnitude half the value 
that would have been obtained by a full circuit around the pole. The signs associated with 
the semicircles correspond as expected to the direction of travel, and the two semicircular 
integrals average to zero. 

We occasionally will want to evaluate a contour integral of a function f(z) on a closed 
path that includes the two pieces of a Cauchy principal value integral -[f(z)dz with a 
simple pole at zo, a semicircular arc connecting them at the singularity, and whatever other 
curve C is needed to close the contour (see Fig. 11.20). 
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FiGuURE 11.20 A contour including a Cauchy principal value integral. 


These contributions combine as follows, noting that in the figure the contour passes over 
the point Zo: 


fro dz+ Ipver + / f(@)dz=2zi > residues (other than at zo), 
Co 


which rearranges to give 


} F(z) dz = —Tover — i f(@)dz+2zi x residues (other than at zo). (11.77) 
C2 


On the other hand, we could have chosen the contour to pass under zo, in which case, 
instead of Eq. (11.77) we would get 


} Ff (2) dz = —Iander — / f(dz+2zi > residues (other than at z9) + 27ia_ , 


C2 
(11.78) 
where the residue denoted a_, is from the pole at zo. Equations (11.77) and (11.78) are in 
agreement because 27ia—1 — Inder = —/over, $0 for the purpose of evaluating the Cauchy 


principal value integral, it makes no difference whether we go below or above the singu- 
larity on the original integration path. 


Pole Expansion of Meromorphic Functions 


Analytic functions f(z) that have only isolated poles as singularities are called meromor- 
phic. Mittag-Leffler showed that, instead of making an expansion about a single regu- 
lar point (a Taylor expansion) or about an isolated singular point (a Laurent expansion), 
it was also possible to make an expansion each of whose terms arises from a different 
pole of f(z). Mittag-Leffler’s theorem assumes that f(z) is analytic at z=O and at all 
other points (excluding infinity) with the exception of discrete simple poles at points z1, 
Z2,---., With respective residues b;, bz, .... We choose to order the poles in a way such 
that 0 < |z1| < |z2| <---, and we assume that in the limit of large z, | f(z)/z| — 0. Then, 
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Mittag-Leffler’s theorem states that 


~ 1 1 
f@=FfO+ > on (——+-—). (11.79) 
nel een <n 
To prove the theorem, we make the preliminary observation that the quantity being 
summed in Eq. (11.79) can be written 


Zbn 
Zn(Zn — Z)° 


suggesting that it might be useful to consider a contour integral of the form 


f(w)dw 
Iv = a a 
w(w — z) 

Cn 


where w is another complex variable and Cy isa circle enclosing the first N poles of f(z). 
Since Cy, which has a radius we denote Ry, has total arc length 27 Ry, and the absolute 
value of the integrand asymptotically approaches | f(Ry)|/R%, the large-z behavior of 
f (z) guarantees that limry— oo Iy = 0. 

We now obtain an alternate expression for J, using the residue theorem. Recognizing 
that Cy encircles simple poles at w = 0, w =z, and w=z,,n=1...N, that f(w) is 
nonsingular at w = 0 and w = z, and that the residue of f(z)/w(w — z) at zp is just 
bn /Zn(Zn — Z), We have 


£O 4 fF , a _2iby 
Iy = 200i = + 27i _ t=) 
Taking the large-N limit, in which Jy = 0, we recover Mittag-Leffler’s theorem, 
Eq. (11.79). The pole expansion converges when the condition limz-,o | f(z)/z| = 0 is 
satisfied. 
Mittag-Leffler’s theorem leads to a number of interesting pole expansions. Consider the 
following examples. 


Example 11.7.3 POLE EXPANSION OF tan z 


Writing 
e% —e% 


tanz= —7—__> 
i(e'= + e7!%) 


we easily see that the only singularities of tanz are for real values of z, and they occur 
at the zeros of cosx, namely at +2/2, +37/2,..., or in general at z, = +(2n + 1)z/2. 
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To obtain the residues at these points, we take the limit (using |’ H6pital’s rule) 
; (z — (2n + 1)x/2) sinz 
by = lim 
z= Cnt Dn COS Z 





_ sinz + (z — (2n + 1)m/2) cosz 
~ —sinz 





=-l, 
— Qn+)n 
t= 7 





the same value for every pole. 

Noting that tan(O) = 0, and that the poles within a circle of radius (N + 1)z will be 
those (of both signs) referred to here by n values 0 through N, Eq. (11.79) for the current 
case (but only through 4) yields 


t _y 1 
anz= 0 (sat aaa) 


n=0 








N ; 1 1 
r di (= (2n + 1)x/2 7 —(2n + sa) 





N 1 1 
= 2G € —Gnt e/a” 2+ Gnt naa) 


n=0 
Combining terms over a common denominator, and taking the limit N — 00, we reach the 
usual form of the expansion: 


1 1 1 
tanz =22 (a + eepae * Gee to), (11.80) 





Example 11.7.4 — PoLe EXPANSION OF cot z 


This example proceeds much as the preceding one, except that cotz has a simple pole at 
z= 0, with residue +1. We therefore consider instead cotz — 1/z, thereby removing the 
singularity. The singular points are now simple poles at nz (n 4 0), with residues (again 
obtained via 1’ H6pital’s rule) 





(z — nz)(zcos z — sinz) 





b, = lim (z—nz)cotz= lim - 
ZA Zn Z Sin Z 


zcosz — sinz + (z —n7)(—zsinz) 





=+1. 
Zn 


Noting that cot z— 1/z is zero at z = 0 (the second term in the expansion of cot z is —z/3), 


we have 
N 
1 1 1 1 1 
cotz——-= ) ( + + + ; 
Z a Zn nim Zn —ni 


sinz + zcosz 
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which rearranges to 





1 1 1 1 
tz= 2 oe 11.81 
cotz oe ‘(a+ asa t zat ) ( ) 


In addition to Eqs. (11.80) and (11.81), two other pole expansions of importance are 


1 3 5 
weean (Ta Gx/—2  Gr/eoe ) en 


1 \ 1 1 
2 ~~ 11.83 
pea ae (x af) Paty GaGa ) ae 








Counting Poles and Zeros 


It is possible to obtain information about the numbers of poles and zeros of a function 
f (z) that is otherwise analytic within a closed region by consideration of its logarithmic 
derivative, namely f’(z)/f(z). The starting point for this analysis is to write an expression 
for f(z) relative to a point zo where there is either a zero or a pole in the form 


f(z) = @ — z0)"g(2), 


with g(z) finite and nonzero at z = zo. That requirement identifies the limiting behavior of 
f (z) near zo as proportional to (z — zo)“, and also causes f’/f to assume near z = zo the 
form 


F'@ _ w= zo)" 18) + —zo)"8'@) _ ow 8’'@ 
f(@) (z — zo)#g(z) ay BZ) 


Equation (11.84) shows that, for all nonzero yp (i.e., if zo is either a zero or a pole), f’/f 
has a simple pole at z = zg with residue jz. Note that because g(z) is required to be nonzero 
and finite, the second term of Eq. (11.84) cannot be singular. 

Applying now the residue theorem to Eq. (11.84) for a closed region within which f (z) 
is analytic except possibly at poles, we see that the integral of f’/f around a closed contour 
yields the result 





(11.84) 


f'@ 
f (2) 
Cc 





dz=2ni(Ny/— Pr), (11.85) 


where Py is the number of poles of f(z) within the region enclosed by C, each multiplied 
by its order, and N is the number of zeros of f(z) enclosed by C, each multiplied by its 
multiplicity. 
The counting of zeros is often facilitated by using Rouché’s theorem, which states 
If f(z) and g(z) are analytic in the region bounded by a curve C and | f (z)| > |g(z)| 


on C, then f(z) and f(z) + g(z) have the same number of zeros in the region bounded 
by C. 
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To prove Rouché’s theorem, we first write, from Eq. (11.85), 


f'(@ f'@+e'(2) 
4 f(z) J f(z) + g(2) 





dz=2niNy and dz=2niNerg, 


where Ny designates the number of zeros of f within C. Then we observe that because 
the indefinite integral of f’/f is In f, Ny is the number of times the argument of f cycles 
through 27 when C is traversed once in the counterclockwise direction. Similarly, we note 
that NV ¢+,¢ is the number of times the argument of f + g cycles through 27 on traversal of 
the contour C. 

We next write 


fre=s(i+ £) and are f+) =arg(f) bare (14 4), (11.86) 


using the fact that the argument of a product is the sum of the arguments of its factors. It 
is then clear that the number of cycles through 27 of arg(f + g) is equal to the number 
of cycles of arg(f) plus the number of cycles of arg(1 + g/f). But because |g/f| < 1, 
the real part of 1+ g/f never becomes negative, and its argument is therefore restricted to 
the range —z/2 < arg(1+ g/f) </2. Therefore arg(1 + g/f) cannot cycle through 27, 
the number of cycles of arg(f + g) must be equal to the number of cycles of arg f, and 
f +g and f must have the same number of zeros within C. This completes the proof of 
Rouche’s theorem. 


Example 17.7.5 — CouNtTING ZEROS 


Our problem is to determine the number of zeros of F(z) = z> — 2z + 11 with moduli 
between | and 3. Since F(z) is analytic for all finite z, we could in principle simply apply 
Eq. (11.85) for the contour consisting of the circles |z| = 1 (clockwise) and |z| = 3 (coun- 
terclockwise), setting Pr = 0 and solving for Nr. However, that approach will in practice 
prove difficult. Instead, we simplify the problem by using Rouché’s theorem. 

We first compute the number of zeros within |z| = 1, writing F(z) = f(z) + g(z), with 
f(z) = land g(z) =z? —2z. It is clear that | f (z)| > |g(z)| when |z| = 1, so, by Rouché’s 
theorem, f and f + g have the same number of zeros within this circle. Since f(z) = 11 
has no zeros, we conclude that all the zeros of F(z) are outside |z| = 1. 

Next we compute the number of zeros within |z| = 3, taking for this purpose f(z) = z?, 
g(z) = 11 — 2z. When |z| = 3, we have | f(z)| = 27 > |g(z)|, so F and f have the same 
number of zeros, namely three (the three-fold zero of f at z= 0). Thus, the answer to our 
problem is that F has three zeros, all with moduli between | and 3. | 


Product Expansion of Entire Functions 
We remind the reader that a function f(z) that is analytic for all finite z is called an entire 


function. Referring to Eq. (11.84), we see that if f(z) is an entire function, then f’(z)/f(z) 
will be meromorphic, with all its poles simple. Assuming for simplicity that the zeros of f 
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are simple and at points z,, so that yz in Eq. (11.84) is 1, we can invoke the Mittag-Leffler 
theorem to write f’/f as the pole expansion 
(z 0) wl 1 I 
FO fe +y| +=], (11.87) 
f@) f (0) n=l Z— fn Zn 
Integrating Eq. (11.87) yields 
ff) 
Zz 
: dz=I1n f(z) —In f (0) 
f@) 
0 
2f'0) < Zz 
=- + In(z — Zn) — In(—z,) + ‘ 
FO) d . ca? 
Exponentiating, we obtain the product expansion 
2f'O\ PF z 
(z) = f (0) ex ( ) 1— = | e?/*, 11.88 
FOTO Fo) as al 
Examples are the product expansions for 
~ z ~ - 
inz= — +) t/nmt sees 
sinz=z TT (1 —)e =2T(1 in): (11.89) 
n#0 ~ 
= 1 — ——____,, }. 11.90 
cos z I ( Ge a! ( ) 
The expansion of sin z cannot be obtained directly from Eq. (11.88), but its derivation is the 
subject of Exercise 11.7.5. We also point out here that the gamma function has a product 
expansion, discussed in Chapter 13. 
Exercises 
11.7.1 | Determine the nature of the singularities of each of the following functions and evaluate 


the residues (a > 0). 











1 
SS b) —.~—~. 
(a) 2+ a? (0) (z2 +. a2)? 
oF . 
Zz sin 1/z 
—_~—___.., d) ——.. 
(©) (z2 + a2)? @ 2+a 
zetiz zetiz 
etls g* 
: h » O<k<l. 
® z= ) = 





11.7.2 
11.7.3 


11.7.4 


11.7.5 


11.7.6 


11.7.7 


11.7.8 
11.7.9 


11.7.10 
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Hint. For the point at infinity, use the transformation w = 1/z for |z| —~ 0. For the 
residue, transform f (z)dz into g(w)dw and look at the behavior of g(w). 


Evaluate the residues at z = 0 and z = —1 of mcotwz/z(z+ 1). 


The classical definition of the exponential integral Ei(x) for x > 0 is the Cauchy prin- 


cipal value integral 
x t 
: e 
Ei(x) = f ry dt, 
—0o 


where the integration range is cut at x = 0. Show that this definition yields a convergent 
result for positive x. 


Writing a Cauchy principal value integral to deal with the singularity at x = 1, show 
that, if0 < p <1, 





CO 
xP 
f dx =—m cot px. 
x—1 
0 


Explain why Eq. (11.88) is not directly applicable to the product expansion of sin z. 
Show how the expansion, Eq. (11.89), can be obtained by expanding instead sin z/z. 


Starting from the observations 


1. f(Z) =anz" has n zeros, and 

2. for sufficiently large | R|, | ea: Am R™ | < |a,R"|, 

use Rouché’s theorem to prove the fundamental theorem of algebra (namely that every 
polynomial of degree n has n roots). 


Using Rouché’s theorem, show that all the zeros of F(z) = z° — 4z3 + 10 lie between 
the circles |z| = 1 and |z| = 2. 


Derive the pole expansions of sec z and csc z given in Eqs. (11.82) and (11.83). 


Given that f(z) = (z* — 3z + 2)/z, apply a partial fraction decomposition to f’/f and 
show directly that fc f'(2)/f (@) dz = 2mi(Ny¢ — Pr), where Nf and Py are, respec- 
tively, the numbers of zeros and poles encircled by C (including their multiplicities). 


The statement that the integral halfway around a singular point is equal to one-half the 
integral all the way around was limited to simple poles. Show, by a specific example, 


that 
1 
/ f@dz=>5 § f(z dz 


Semicircle Circle 
does not necessarily hold if the integral encircles a pole of higher order. 


Hint. Try f(z) =z”. 
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11.7.11. A function f(z) is analytic along the real axis except for a third-order pole at z = xo. 


The Laurent expansion about z = xo has the form 


_ a_3 a_-| 
t= Ga + g(Z), 


with g(z) analytic at z = x9. Show that the Cauchy principal value technique is appli- 
cable, in the sense that 





(a) lims+o Wes S(x)dx + hess f(x) dx| is finite. 
(b) fo, f@dz= tina, 


where C,, denotes a small semicircle about z = xo. 





11.7.12 The unit step function is defined as (compare Exercise 1.15.13) 


11.8 





0, s<a 
a ke s>a. 
Show that u(s) has the integral representations 
[o,@) 
1 ets 
(a) u(s) = lim,_,o+ sot —— dx. 
2miJ x—ié 
—0o 
1 1 oa UXS 
e 
b So dx. 
EES) staat x 
—0o 


Note. The parameter s is real. 


EVALUATION OF DEFINITE INTEGRALS 


Definite integrals appear repeatedly in problems of mathematical physics as well as in 
pure mathematics. In Chapter | we reviewed several methods for integral evaluation, there 
noting that contour integration methods were powerful and deserved detailed study. We 
have now reached a point where we can explore these methods, which are applicable to a 
wide variety of definite integrals with physically relevant integration limits. We start with 
applications to integrals containing trigonometric functions, which we can often convert to 
forms in which the variable of integration (originally an angle) is converted into a complex 
variable z, with the integration integral becoming a contour integral over the unit circle. 


Trigonometric Integrals, Range (0,27) 


We consider here integrals of the form 


20 
1= [ Ftsin0,cos0) do, (11.91) 
0 
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where f is finite for all values of 0. We also require f to be a rational function of sind 
and cos @ so that it will be single-valued. We make a change of variable to 


— 2, dz= ie!’ do, 
with the range in 0, namely (0, 27), corresponding to e!? moving counterclockwise around 
the unit circle to form a closed contour. Then we make the substitutions 


d —z! = 
24 = nt : a el ; 
Zz 2i 2 


where we have used Eq. (1.133) to represent sin@ and cos@. Our integral then becomes 


=4 = 
L=Z Z+Z dz 
l=-i 11.93 
if r( — )¢. (11.93) 








(11.92) 





with the path of integration the unit circle. By the residue theorem, Eq. (11.64), 
I= (-i)2zi a residues within the unit circle. (11.94) 
Note that we must use the residues of f/z. Here are two preliminary examples. 
Example 11.8.1 \nteGRALor cos IN DENOMINATOR 


Our problem is to evaluate the definite integral 





20 
dé 
I= {| ———., |a|<l. 
1+acos0é 
0 
By Eq. (11.93) this becomes 
. dz 
l=-i 7 
z[1 + (a/2)(z +271)] 
unit circle 





= 4 dz 
= l _ 
aj 24+ (Q/a)z+1 


The denominator has roots 


1+vV1-a2 1—-vV1-a2 
- . 


w= and z2=— 


a 
Noting that z;z2 = 1, it is easy to see that z2 is within the unit circle and z, is outside. 
Writing the integral in the form 
§ dz 
(z— z1)(Z — 22)’ 


we see that the residue of the integrand at z = zz is 1/(z2 — z1), so application of the 
residue theorem yields 
2 . oil 
I=-i--27i : 
a 22-21 
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Inserting the values of z, and zz, we obtain the final result 


20 


/ dé 20 wis 
= ‘ aj|<l. 
l+acos@ /1]—@2 





Example 17.8.2 — ANOTHER TRIGONOMETRIC INTEGRAL 


Consider 
20 
I< f cos 26 dd 
~ J 5—4cosé’ 
0 


Making the substitutions identified in Eqs. (11.92) and (11.93), the integral J assumes the 


form 
_ f 5 (<2 +277) (=) 
5=20+e))\ ¢ 
i f (24+ ldz 
4J 22(z-45) (2-2) 
where the integration is around the unit circle. Note that we identified cos26 as (z* + 
z~*)/2, which is simpler than reducing it first to its equivalent in terms of sin z and cos z. 
We see that the integrand has poles at z = 0 (of order 2), and simple poles at z = 1/2 and 


z = 2. Only the poles at z = 0 and z = 1/2 are within the contour. 
At z= 0 the residue of the integrand is 


d gl e 
dz | (2-5) (z-2) 0 2 


while its residue at z = 1/2 is 
+1 17 
22-2) 6 


Applying the residue theorem, we have 











i [5 17 
I=-(2mi)|}~-—]=>-. 
4 2 6 6 

a 


We stress that integrals of the type now under consideration are evaluated after trans- 
forming them so that they can be identified as exactly equivalent to contour integrals to 
which we can apply the residue theorem. Further examples are in the exercises. 
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Integrals, Range —oo to co 


Consider now definite integrals of the form 
[o,@) 

T= / f(x) dx, (11.95) 
—c 


where it is assumed that 


e f(z) is analytic in the upper half-plane except for a finite number of poles. For the 
moment will be assumed that there are no poles on the real axis. Cases not satisfying 
this condition will be considered later. 


e In the limit |z| — oo in the upper half-plane (0 < argz < mz), f(z) vanishes more 
strongly than 1/z. 


Note that there is nothing unique about the upper half-plane. The method described here 
can be applied, with obvious modifications, if f(z) vanishes sufficiently strongly on the 
lower half-plane. 

The second assumption stated above makes it useful to evaluate the contour integral 
f f(z) dz on the contour shown in Fig. 11.21, because the integral J is given by the inte- 
gration along the real axis, while the arc, of radius R, with R > ov, gives a negligible 
contribution to the contour integral. Thus, 


I= § f(z) dz, 


and the contour integral can be evaluated by applying the residue theorem. 
Situations of this sort are of frequent occurrence, and we therefore formalize the condi- 
tions under which the integral over a large arc becomes negligible: 


Tf Vimp_-s 90 Zf(Z) = 0 for all z= Re’? with 0 in the range 0, <0 < 69, then 


im / f(z)dz=0, (11.96) 
Cc 


where C is the arc over the angular range 0, to 02 on a circle of radius R with 
center at the origin. 








FiGURE 11.21 A contour closed by a large semicircle in the upper half-plane. 
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To prove Eq. (11.96), simply write the integral over C in polar form: 


02 
lim [roa =| lim |f(Re!* yi Rel? do 
R>oo R>oo 
C 61 


< (6) — 61) lim |f(Re") Re"? ai), 
R->0oo 


Now, using the contour of Fig. 11.21, letting C denote the semicircular arc from 0 = 0 
tod=7, 


R 
$ fe dz= jim, | f(x) dx + dim | f(z)dz 
—R Cc 


=2ni ye residues (upper half-plane), (11.97) 


where our second assumption has caused the vanishing of the integral over C. 


Example 11.8.3. — INTEGRALOF MEROMORPHIC FUNCTION 


Evaluate 
CO 
1=| dx 
- 1+x2° 
0 


This is not in the form we require, but it can be made so by noting that the integrand is 


even and we can write 

lf d 

Xx 
r=~ | —.. 11.98 
2 i 1+x?2 ( ) 
—0o 

We note that f(z) = 1/(1 +7) is meromorphic; all its singularities for finite z are poles, 
and it also has the property that zf(z) vanishes in the limit of large |z|. Therefore, we may 
apply Eq. (11.97), so 


[o,e) 
1 d 1 
; / Tad = 5 2r) y residues of 


—oo 


1 
1+ 22 





(upper half-plane). 


Here and in every other similar problem we have the question: Where are the poles? 
Rewriting the integrand as 





1 1 
+1 (z+i)(z-i)’ 
we see that there are simple poles (order 1) at z =i and z = —i. The residues are 
atz=1: : = and atz=W—i: : = : 








ztilei §2i’ Z—-ile-i 2i- 
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However, only the pole at z = +i is enclosed by the contour, so our result is 


[e.e) 


| dx _lo yl arses 
= IU = 4 > 
Dake eae) 
0 





1+x2 


This result is hardly a surprise, as we presumably already know that 


[oe 
dx _1.|™ co 6 
= tan x| = arctan x =; 
1+x 0 0 2 
0 


but, as shown in later examples, the techniques illustrated here are also easy to apply when 
more elementary methods are difficult or impossible. 

Before leaving this example, note that we could equally well have closed the contour 
with a semicircle in the lower half-plane, as zf(z) vanishes on that arc as well as that in the 
upper half-plane. Then, taking the contour so the real axis is traversed from —oo to +00, 
the path would be clockwise (see Fig. 11.22), so we would need to take —2z7 times the 
residue of the pole that is now encircled (at z = —i). Thus, we have J = — 5 (27i)(—1/2i), 
which (as it must) evaluates to the same result we obtained previously, namely 7/2. Ml 


Integrals with Complex Exponentials 


Consider the definite integral 
CO 
I= / f(xyel™ dx, (11.100) 
—0o 


with a real and positive. (This is a Fourier transform; see Chapter 19.) We assume the 
following two conditions: 


e f(z) is analytic in the upper half-plane except for a finite number of poles. 


e limjsoo f(z) =90, O<arg z<7. 


Note that this is a less restrictive condition than the second condition imposed on f(z) for 
our previous integration of [> f(x) dx. 





FIGURE 11.22 A contour closed by a large semicircle in the lower half-plane. 





528 


Chapter 11 Complex Variable Theory 


We again employ the half-circle contour shown in Fig. 11.21. The application of the 
calculus of residues is the same as the example just considered, but here we have to work 
harder to show that the integral over the (infinite) semicircle goes to zero. This integral 
becomes, for a semicircle of radius R, 


8 
Tea f f(Rel? lateo-at snd; pei dé, 
0 


where the @ integration is over the upper half-plane, 0 < 6 < z. Let R be sufficiently large 
that | f (z)| = | f(Re!®)| < « for all 6 within the integration range. Our second assumption 
on f(z) tells us that as R > oo, e > 0. Then 


ua m/2 
Tr| ser f e“hsn? dam rer f e-a®sint ae, (11.101) 
0 0 


We now note that in the range [0, 2/2], 


2 . 
—6 < sing, 
a 


as is easily seen from Fig. 11.23. Substituting this inequality into Eq. (11.101), we have 


m/2 
1 —aR 
[Tr| <2eR / e~20R6/" gg —I2eR— ce Dg, 
2aR/n a 
0 
showing that 
lim Ir =0. 
R->0o 


This result is also important enough to commemorate; it is sometimes known as Jordan’s 
lemma. Its formal statement is 


If limpeoo f (z) = 0 for all z= Re!® in the range 0 <0 <n, then 
lim ferro dz=0, (11.102) 
Roo 
Cc 


where a > 0 and C is a semicircle of radius R in the upper half-plane with center at 

the origin. 
Note that for Jordan’s lemma the upper and lower half-planes are not equivalent, because 
the condition a > 0 causes the exponent —aR sin@ only to be negative and yield a neg- 
ligible result in the upper half-plane. In the lower half-plane, the exponential is positive 
and the integral on a large semicircle there would diverge. Of course, we could extend the 
theorem by considering the case a < 0, in which event the contour to be used would then 
be a semicircle in the lower half-plane. 
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FIGURE 11.23 (a) y = (2/7)0, (b) y =sin@. 


Returning now to integrals of the type represented by Eq. (11.100), and using the contour 
shown in Fig. 11.21, application of the residue theorem yields the general result (for a > 0), 


foe) 
/ f(xjel dx = 2x ) “residues of e'“ f(z) (upper half-plane), (11.103) 
—oo 


where we have used Jordan’s lemma to set to zero the contribution to the contour integral 
from the large semicircle. 


Example 11.8.4 — OscILLATORY INTEGRAL 


Consider 





[o,@) 
cos x 
I = dx, 
x241 
0 
which we initially manipulate, introducing cos.x = (e’* + e~'*) /2, as follows: 
oO co 
fe {= a {= 
2S x2 410 2) x2 +1 
0 0 


Cc a —oo a lo) < 
_ {ss gt e'* d(—x) [a 
3) e441 ° 24 Cao 2S 22 $i" 
0 0 


—oo 








thereby bringing / to the form presently under discussion. 

We now note that in this problem f(z) = 1/(z? + 1), which certainly approaches zero 
for large |z|, and the exponential factor is of the form e!, with a = +1. We may therefore 
evaluate the integral using Eq. (11.103), with the contour shown in Fig. 11.21. 

The quantity whose residues are needed is 


els els 


2+1 (z+i(z—i)’ 
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and we note that the exponential, an entire function, contributes no singularities. So our 
singularities are simple poles at z = +7. Only the pole at z = +i is within the contour, and 
its residue is e’ /2i, which reduces to 1/2ie. Our integral therefore has the value 





1 1 cA 
I=-=(Qzi)— =—. 
2; 2ie 2e 


Our next example is an important integral, the evaluation of which involves the 
principal-value concept and a contour that apparently needs to go through a pole. 


Example 11.8.5 SINGULARITY ON CONTOUR OF INTEGRATION 


We now consider the evaluation of 


sin x 
r= [ ax. (11.104) 
x 


Writing the integrand as (e'* — e~‘*)/2iz, an attempt to do as we did in Example 11.8.4 
leads to the problem that each of the two integrals into which J can be separated is individ- 
ually divergent. This is a problem we have already encountered in discussing the Cauchy 
principal value of this integral. Referring to (11.74), we write J as 


my ix q 
1=f* a (11.105) 
2ix 





—oo 


suggesting that we consider the integral of e!/2iz over a suitable closed contour. 

We now note that although the gap at x = 0 is infinitesimal, that point is a pole of 
e'</2iz, and we must draw a contour which avoids it, using a small semicircle to con- 
nect the points at —d and +6. Compare with the discussion at Eqs. (11.75) and (11.76). 
Choosing the small semicircle above the pole, as in Fig. 11.20, we then have a contour 
that encloses no singularities. 

The integral around this contour can now be identified as consisting of (1) the two semi- 
infinite segments constituting the principal value integral in Eq. (11.105), (2) the large 
semicircle Cr of radius R (R > oo), and (3) a semicircle C, of radius r (r — 0), traversed 


clockwise, so 
elz lz lz 
dz=I1 d dz=0. 11.106 
$x . +f = z+ | 7 ( ) 


C; CR 











By Jordan’s lemma, the integral over Cr vanishes. As discussed at Eq. (11.75), the clock- 
wise path C, half-way around the pole at z = 0 contributes half the value of a full circuit, 
namely (allowing for the clockwise direction of travel) —zi times the residue of e!</2iz at 
z =0. This residue has value 1/2i, so So, = —wi(1/2i) = —7/2, and, solving Eq. (11.106) 
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for J, we then obtain 
[o,@) 

sin x XT 
I= | —dx=-. (11.107) 

K 2 

0 

Note that it was necessary to close the contour in the upper half-plane. On a large circle 
in the lower half-plane, e'“ becomes infinite and Jordan’s lemma cannot be applied. | 


Another Integration Technique 


Sometimes we have an integral on the real range (0, co) that lacks the symmetry needed 
to extend the integration range to (—co, 00). However, it may be possible to identify a 
direction in the complex plane on which the integrand has a value identical to or conve- 
niently related to that of the original integral, thereby permitting construction of a contour 
facilitating the evaluation. 


Example 11.8.6 — EvaLuaTION ON A CIRCULAR SECTOR 


Our problem is to evaluate the integral 


CO 
/ dx 
I= ; 
x3] 
0 


which we cannot convert easily into an integral on the range (—co, oo). However, we note 
that along a line with argument @ = 27/3, z> will have the same values as at corresponding 
points on the real line; note that (re?”!/3)3 = r3e?"! = r>. We therefore consider 


§ dz 
3+] 
on the contour shown in Fig. 11.24. The part of the contour along the positive real axis, 


labeled A, simply yields our integral /. The integrand approaches zero sufficiently rapidly 
for large |z| that the integral on the large circular arc, labeled C in the figure, vanishes. On 














FIGURE 11.24 Contour for Example 11.8.6. 
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the remaining segment of the contour, labeled B, we note that dz = e?7'/3dr, 23 =r3, and 


0 . love) 
/ dz -/(* 2: ens fA ey 
2+] re+1 r+ y 
B lo) 


0 





Therefore, 
dz ni 
a = (1-29) 7, 11.108 
f 34+] ( ( ) 
We now need to evaluate our complete contour integral using the residue theorem. The 
integrand has simple poles at the three roots of z> + 1, which are at z} = e7'/3, zo =e", 


and z3 = e?”'/3, as marked in Fig. 11.24. Only the pole at z; is enclosed by our contour. 
The residue at z = z] is 


1 


li Z—- ZI 1 
im => =e 
z=2] 3e27i/3 


zu +1 322 
Equating 277i times this result to the value of the contour integral as given in Eq. (11.108), 


we have 
. 1 
27i/3 a : 
(1- 2") 1 = 201 (aan). 


—mi/3 








Solution for J is facilitated if we multiply through by e 


(em = nil?) l=2ni (-3). 


which is easily rearranged to 


, obtaining initially 


A 20 


T 
I = — = i 
3sint/3 3/3/2373 





Avoidance of Branch Points 


Sometimes we must deal with integrals whose integrands have branch points. In order to 
use contour integration methods for such integrals we must choose contours that avoid the 
branch points, enclosing only point singularities. 


Example 11.8.7 INTEGRAL CONTAINING LOGARITHM 


We now look at 
CO 


=/{S" (11.109) 
a ea 





0 


The integrand in Eq. (11.109) is singular at x = 0, but the integration converges (the indef- 
inite integral of In x is x Inx — x). However, in the complex plane this singularity manifests 
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FiGuRE 11.25 Contour for Example 11.8.7. 


itself as a branch point, so if we are to recast this problem in a way involving a contour 
integral, we must avoid z = 0 and a branch cut from that point to z = oo. It turns out to be 
convenient to use a contour similar to that for Example 11.8.6, except that we must make a 
small circular detour about z = 0 and then draw the branch cut in a direction that remains 
outside our chosen contour. Noting also that the integrand has poles at the same points as 
those of Example 11.8.6, we consider a contour integral 


Inzdz 
parr 
where the contour and the locations of the singularities of the integrand are as illustrated 
in Fig. 11.25. 
The integral over the large circular arc, labeled C, vanishes, as the factor z> in the 
denominator dominates over the weakly divergent factor Inz in the numerator (which 


diverges more weakly than any positive power of z). We also get no contribution to the 
contour integral from the arc at small r, since we have there 


2n/3 ; 

Inve’). a 

eof Te rseseire 4% 
0 


which vanishes because r Inr —> 0. 

The integrals over the segments labeled A and B do not vanish. To evaluate the integral 
over these segments, we need to make an appropriate choice of the branch of the multi- 
valued function Inz. It is natural to choose the branch so that on the real axis we have 
Inz = Inx (and not Inx + 2nzi with some nonzero n). Then the integral over the segment 
labeled A will have the value J.’ 

To compute the integral over B, we note that on this segment 2? = r? and dz = e?”'/7.dr 
(as in Example 11.8.6), also but note that Inz = Inr + 27/3. There is little temptation here 
to use a different one of the multiple values of the logarithm, but for future reference note 
that we must use the value that is reached continuously from the value we already chose 
on the positive real axis, moving in a way that does not cross the branch cut. Thus, we 
cannot reach segment A by clockwise travel from the positive real axis (thereby getting 





7Because the integration converges at x = 0, the value is not affected by the fact that this segment terminates infinitesimally 
before reaching that point. 
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Inz = Inr — 47/3) or any other value that would require multiple circuits around the 
branch point z = 0. 
Based on the foregoing, we have 





0 oo 
[Se [eta =e ane f dr (L110) 
3+ r+ 3 P+ 
B oo 0 


Referring to Example 11.8.6 for the value of the integral in the final term of Eq. (11.110), 
and combining the contributions to the overall contour integral, 


Inzdz Ini ami j 21 
—(1— mi /3 [—-— 2ni/3 (=). 11.111 
pa ae) 3° 3/3 es 


Our next step is to use the residue theorem to evaluate the contour integral. Only the 
pole at z = z, lies within the contour. The residue we must compute is 











lim (z =n = me _ male _ eth, 
and application of the residue theorem to Eq. (11.111) yields 
(1 = gu) f= = e2ri/3 (=) = (277i) (=) en 2/3, (11.112) 
Solving for J, we get 
a (11.113) 


Verification of the passage from Eq. (11.112) to (11.113) is left to Exercise 11.8.6. a 


Exploiting Branch Cuts 


Sometimes, rather than being an annoyance, a branch cut provides an opportunity for a 
creative way of evaluating difficult integrals. 


Example 11.8.8 — UsiNGA BRANCH CUT 


Let’s evaluate 





Cc 
xP dx 
I= Ee | O<p<l. 
0 


$=“ 
241’ 


where the contour is that shown in Fig. 11.26. Note that z = 0 is a branch point, and we 
have taken the cut along the positive real axis. We assign z? its usual principal value 


Consider the contour integral 
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FiGURE 11.26 Contour for Example 11.8.8. 


(which is x?) just above the cut, so that the segment of the contour labeled A, which 
actually extends from ¢ to oo, converges in the limit of small ¢ to the integral /. Neither 
the circle of radius ¢ nor that at R — oo contributes to the value of the contour integral. 
On the remaining segment of the contour, labeled B, we have z = re*”', written this way 
so we can see that z? = r?e??™!, We use this value for z? on segment B because we must 
get to B by encircling z = 0 in the counterclockwise, mathematically positive direction. 
The contribution of segment B to the contour integral is then seen to be 


0 
Pp 2pri g 
rPe r . 
/ = gn 


r2+1 
(oe) 
so 
zP dz dpri 
es Pee, 11.114 
f 2+1 ( . ( ) 
To apply the residue theorem, we note that there are simple poles at z; =i and z2 = —i; 
to use these for evaluation of z? we need to identify these as zj = e™'/? and zy = e™'/2, 


It would be a serious mistake to use z. = e~7!/2 


residues to be: 


when evaluating z}. We now find the 


epni/2 edPri/2 


Residue at z1: —, Residue at z2: —, 
2i —2i 


and we have, referring to Eq. (11.114), 








(1 = ern) I = Qn) = (epxi?2 = err) (11.115) 
L 


This equation simplifies to 


pa ESin(pr/2) _ 4 (11.116) 





sinpw — 2cos(pm/2)' 
The details of the evaluation are left to Exercise 11.8.7. |_| 


The use of a branch cut, as illustrated in Example 11.8.8, is so helpful that sometimes it 
is advisable to insert a factor into a contour integral to create one that would not otherwise 
exist. To illustrate this, we return to an integral we evaluated earlier by another method. 
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Example 71.8.9 INTRODUCING A BRANCH POINT 


Let’s evaluate once again the integral 
[o,@) 
/ dx 
l= 
x34] 
0 
which we previously considered in Example 11.8.6. This time, we proceed by setting up 


the contour integral 
§ Inzdz 
el 


taking the contour to be that depicted in Fig. 11.26. Note that in the present problem the 
poles of the integrand are not those shown in Fig. 11.26, which was originally drawn to 
illustrate a different problem; for the locations of the poles of the present integrand, see 
Fig. 11.24. 

The virtue of the introduction of the factor Inz is that its presence causes the integral 
segments above and below the positive real axis not to cancel completely, but to yield 
a net contribution corresponding to an integral of interest. In the present problem (using 
the labeling in Fig. 11.26), we again have vanishing contributions from the small and large 
circles, and (taking the usual principal value for the logarithm on segment A), that segment 
contributes to the contour integral the expected value 








[o.@) 
[ss -([ (11.117) 
o+l J x341° ‘ 
A 0 


However, segment B make the contribution 








0 
1 1 Qi 
/+5- (nx + eaee (11.118) 


+1 x3+1 
B 


CO 


and when Eqs. (11.117) and (11.118) are combined, the logarithmic terms cancel, and we 


are left with 
Inzd Inzd 7, d 
nzdz nzdz x 
fon i i nif 4 oe ( ) 
0 


A+B 








Note that what has happened is that the logarithm has disappeared (its contributions can- 
celed), but its presence caused the integral of current interest to be proportional to the value 
of the contour integral we introduced. 

To complete the evaluation, we need to evaluate the contour integral using the residue 
theorem. Note that the residues are those of the integrand, including the logarithmic factor, 
and this factor must be computed taking account of the branch cut. In the present problem, 
we identify poles at z) = e7!/9, zy = e™!, and 73 = @7'/3 (not e~*'/3). The contour now 
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in use encircles all three poles. Their respective residues (denoted R;) are 


aa 1 1 Si 1 
Ri = nae Zemips” Ro = (zi) 3 eoni/3’ and R3= “3 3 el0mi/3’ 


where the first parenthesized factor of each residue comes from the logarithm. 
Continuing, we have, referring to Eq. (11.119), 


—2ni I =27i (Ri + Ro + R3); 


mi ; 20 
T= — (Ri + Rot Ry) =— [eP3.4. 34 50713] = 
2 3/3 
More robust examples involving the introduction of In z appear in the exercises. | 


Exploiting Periodicity 
The periodicity of the trigonometric functions (and that, in the complex plane, of the hyper- 
bolic functions) creates opportunities to devise contours in which multiple contributions 


corresponding to an integral of interest can be used to encircle singularities and enable use 
of the residue theorem. We illustrate with one example. 


Example 711.8.10 INTEGRAND PERIODIC ON IMAGINARY AXIS 


We wish to evaluate 


CO 
/ xdx 
I= - . 
sinh x 
0 


Taking account of the sinusoidal behavior of the hyperbolic sine in the imaginary direction, 


we consider 
d 
§ ls (11.120) 
sinh z 








on the contour shown in Fig. 11.27. In drawing the contour we needed to be mindful of the 
singularities of the integrand, which are poles associated with the zeros of sinh z. Recog- 
nizing that 


sinh(x + iy) = sinhx coshiy + coshx sinhiy = sinhxcosy+icoshxsiny, (11.121) 

















FiGURE 11.27 Contour for Example 11.8.10. 
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and that for all x, cosh x > 1, we see that sinh z is zero only for z = nai, with n an integer. 
Moreover, because lim,_,9 z/ sinh z = 1, the integrand of our present contour integral will 
not have a pole at z = 0, but will have poles at z = ni for all nonzero integral n. For that 
reason, the lower horizontal line of the contour in Fig. 11.27, marked A, continues through 
z = 0 as a straight line on the real axis, but the upper horizontal line (for which y = z), 
marked B and B’, has an infinitesimal semicircular detour, marked C, around the pole at 
Z=Ti. 

Because the integrand in Eq. (11.120) is an even function of z, the integral on segment A, 
which extends from —oo to +00, has the value 27. To evaluate the integral on segments 
B and B’, we first note, using Eq. (11.121), that sinh(x + iz) = —sinhx, and that the 
integral on these segments is in the direction of negative x. Recognizing the integral on 
these segments as a Cauchy principal value, we write 


CO 
/ zdz f x+in 
; = - dx 
sinh z sinh x 


B+B! —00 





Because x/sinhx is even and nonsingular at z = 0, while iz/sinhx is odd, this integral 
reduces to 
[o.@) 
xin 
sinh x 
—0o 


Combining what we have up to this point, invoking the residue theorem, and noting that 
the integrand is negligible on the vertical connections at x = +00. We have 








zdz zdz . : ; , 

- =47+ - = 2mi (residue of z/sinhz at z=). (11.122) 
sinh z sinh z 

To complete the evaluation, we now note that the residue we need is 


Z(z — i) i : 
= =-Ti, 





zoni sinhz coshzi 


and, cf. Eqs. (11.75) and (11.76), the counterclockwise semicircle C evaluates to wi times 
this residue. We have then 
= 
41 + (wzi)(—mi) = (27i1)(—i), so T= ; 


Exercises 


Generalizing Example 11.8.1, show that 











20 20 
/ do i do 2n ‘ [ 
= = ra> |DI. 
atbcosd J atbsind (a2—by2 “ 
0 0 


What happens if |b| > |a|? 





11.8.2 


11.8.3 


11.8.4 


11.8.5 


11.8.6 


11.8.7 


11.8.8 


11.8.9 


11.8.10 
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Show that i ae he 1 
= > 1. 
Owe | (@+cos8? ~~ (@— ps’ 7 
0 


20 


6 2 
Show that / @ 
0 


1—2tcos6+f2 1—22’ 





for |t| < 1. 


What happens if |f| > 1? What happens if |t| = 1? 


20 





Eval fa 
a 5—4cos@ 
0 
ANS. 1/12. 

With the calculus of residues, show that 

i (Qn)! Qn=1)!! 

2n - nN): = | (od at = 
[os O40 =F once Qn)’ n=0, 1523525; 
0 


The double factorial notation is defined in Eq. (1.76). 
Hint. cos@ = 3(e® +e) =S(¢+271), |zl=l. 


Verify that simplification of the expression in Eq. (11.112) yields the result given in 
Eq. (11.113). 


Complete the details of Example 11.8.8 by verifying that there is no contribution to 
the contour integral from either the small or the large circles of the contour, and that 
Eq. (11.115) simplifies to the result given as (11.116). 





Co 
cos bx — cosax 
Evaluate / a dx, a>b>0. 
x 
—oo 
ANS. sm(a—b). 
Oe Eko. 
sin’ x 4 
Prove that i 2 dx = 3" 


—cCo 


Hint. sin? x = 4(1 — cos 2x). 


x sinx 
x2+1 





CO 
cA 
Show that / dx=—. 
2e 
0 
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11.8.11. A quantum mechanical calculation of a transition probability leads to the function 
f(t, w) = 2(1 — coswt)/w*. Show that 


i f(t,@)dw=2nt. 


11.8.12 Show that (a > 0): 


[ee 


cos x des wT a 
(a) pie x= a es 
—0o 
How is the right side modified if cos x is replaced by cos kx? 
[o,@) 
x sinx = 
(b) i so 2. dx=Te 7 
—oo 


How is the right side modified if sinx is replaced by sinkx? 


11.8.13 Use the contour shown (Fig. 11.28) with R — oo to prove that 


ee) 


sin. x 
dx=ua. 
x 


—oo 





11.8.14 Inthe quantum theory of atomic collisions, we encounter the integral 


CO 


sint ; 
r= f era, 





FIGURE 11.28 Contour for Exercise 11.8.13. 





11.8.15 


11.8.16 


11.8.17 


11.8.18 
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in which p is real. Show that 


I=0, |p| >1 
IT=n, |p| <1. 





What happens if p = +1? 


Co 
dx TU 
Show that [ G+ ae = re a>0O. 
0 





CO 

42 

Evaluate / dx 
1+x 


—0o 





ANS. x/V2. 





Co 
xP Inx 
Evaluate al dx, O<p<l. 


0 


nm sin(xp/2) 


ANS. — : 
a 4 cos*(zp/2) 





CO 
1 2 
Evaluate f dx, 
1+x2 
0 


(a) by appropriate series expansion of the integrand to obtain 


4) (-1)"Qn+1), 


n=0 


3 
F . . 
(b) and by contour integration to obtain 7 


Hint. x > z= e'. Try the contour shown in Fig. 11.29, letting R > oo. 

















FiGuURE 11.29 Contour for Exercise 11.8.18. 
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CO 
11.8.19 Prove that / 


0 
11.8.20 Show that 


In(1 + x? 
ee eee 
1+ x? 


a a 
[aes 
(x + 1)2 sina 
0 
where —1 <a <1. 


Hint. Use the contour shown in Fig. 11.26, noting that z = 0 is a branch point and the 
positive x-axis can be chosen to be a cut line. 


11.8.21 Show that 


(oe) 








/ x2dx — we a4 
x4—2x?cos20+1  2sin0  2!/2(1 —cos26)!/2" 


—oo 
Exercise 11.8.16 is a special case of this result. 


11.8.22 Show that 


[o,@) 
| dx _— «/n 
1+x"~ sin(/n)’ 
0 
Hint. Try the contour shown in Fig. 11.30, with 6 = 27/n. 
11.8.23 (a) Show that 





f(z) = 24 — 227 cos20 +1 


has zeros at e!?, e~!?, —e!? and —e7"?. 


(b) Show that 


ee) 





i dx — kt TA 
x4 —2x2cos20+1  2sin@  2!/2(1 —cos26)!/2" 


—oo 


Exercise 11.8.22 (n = 4) is a special case of this result. 


R 





R 


FiGURE 11.30 Sector contour. 





11.8.24 


11.8.25 


11.8.26 


11.8.27 


11.8.28 
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Show that 





lo.@) 
x? 4 
dx =-— : 
x+1 sina 
0 


Hint. You have a branch point and you will need a cut line. Try the contour shown in 
Fig. 11.26. 


where 0 <a < 1. 


[o.@) 
cosh bx A 


Show that / x = ———__., 
coshx 2cos(zb/2) 
0 


Hint. Choose a contour that encloses one pole of cosh z. 





|b] <1. 


Show that 


[e,2) [o,2) 


peed ena a 


Hint. Try the contour shown in Fig. 11.30, with 6 = 7/4. 


Note. These are the Fresnel integrals for the special case of infinity as the upper limit. 
For the general case of a varying upper limit, asymptotic expansions of the Fresnel 
integrals are the topic of Exercise 12.6.1. 


1 
1 
0 
Hint. Try the contour shown in Fig. 11.31. 


(oe) 
tan! axd 
Evaluate / Gee for a and b positive, with ab < 1. 


—co 


Explain why the integrand does not have a singularity at x = 0. 


FiGURE 11.31 Contour for Exercise 11.8.27. 
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Hint. Try the contour shown in Fig. 11.32, and use Eq. (1.137) to represent tan7! az. 
After cancellation, the integrals on segments B and B’ combine to give an elementary 
integral. 


11.9 EVALUATION OF SUMS 


The fact that the cotangent is a meromorphic function with regularly spaced poles, all with 

the same residue, enables us to use it to write a wide variety of infinite summations in terms 

of contour integrals. To start, note that 2 cot az has simple poles at all integers on the real 
axis, each with residue 

_ WCOSTZ 

lim ———— = 


zon sinwz 


1. 


Suppose that we now evaluate the integral 


In = § f (2x cotrzdz, 
Cn 
where the contour is a circle about z = 0 of radius N + 5 (thereby not passing close to 
the singularities of cotwz). Assuming also that f(z) has only isolated singularities, at 


points z; other than real integers, we get by application of the residue theorem (see also 
Exercise 11.9.1), 


N 
Iy = 200i > f(a) +2z7i a (residues of f(z) cotzz at singularities z; of f). 
n=—N J 


This integral over the circular contour Cy will be negligible for large |z| if zf(z) > 0 at 
large |z|.° When that condition is met, limy_; Iy = 0, and we have the useful result 


[o.@) 
> fm=- > (residues of f(z)z cotzz at singularities z; of f). (11.123) 


n=—OoO J 


The condition required of f(z) will usually be satisfied if the summation of Eq. (11.123) 
converges. 








FIGURE 11.32 Contour for Exercise 11.8.28. 


8See also Exercise 11.9.2. 
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Example 11.9.1 EVALUATING A SUM 


Consider the summation 


4 
S= =— 
1 n2 + a2 
n=1 
where, for simplicity, we assume that a is nonintegral. To bring our problem to the form 
we know how to treat, we note that also 


=i 1 


Ds mo ge =s; 
n=—oOo 
so that 
3 a eee (11.124) 
n2+a2_ a’ ; 
n=—OCoO 


where we have added on the right-hand side the contribution from n = 0 that was not 
included in S. 

The summation is now identified as of the form of Eq. (11.123), with f(z) = 1/(27 + 
a’); f (z) approaches zero at large z rapidly enough to make Eq. (11.123) applicable. We 
therefore proceed to the observation that the only singularities of f(z) are simple poles at 











z = tia. The residues we need are those of x cot(az)/(z* + a”); they are 
acotiza —xzcothza d mcot(—iza) —zcoth(—za) 
a an = : 
2ia 2a —2ia —2a 

These are equal, so from Eqs. (11.123) and (11.124), 

1 x cothza 

23 = 
a a 
xz cothz 1 

which we easily solve to reach § = ated (ena | 

2a 2a? 


Additional types of summations can be performed if we replace cotzz by functions 
with other regularly repeating patterns of residues. For example, csc 7z has residues for 
integer z that alternate in sign between +1 and —1; mz tanzz has residues that are all +1, 
but occur at the points 7 + 5. And z sec zz has residues +1 at the half-integers with a sign 
alternation. For convenience, we list in Table 11.2 the contour-integral formulas for the 
four types of summations we have just discussed. 

We close this section with another example, this time illustrating what can be done if 
f (Z) has a pole at an integer value of z. 





Example 77.9.2 = ANOTHER SUM 


Consider now the summation 


as 1 
ep Berea 


n=1 
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Table 11.2 Contour-Integral-Based Formulas for Summations 











Summation Formula 
CO 
2, f(n) - — (residues of f (z)z cotzz at singularities of f). 
n=—0o 
CO 
» (-1l)" f(@)n = > (residues of f (z)z csc z at singularities of f). 
n=—OOo 
CO 
> f (n + 5) > (residues of f (z)z tanzz at singularities of f). 
n=—Oo 
CO 
a (-1l)"f (n + x) De (residues of f (z)z sec 1z at singularities of f). 
n=—0oo 
2 
To extend the summation to n = —oo, we note that § = ys —., so that 
n(n+ 1) 
n=—Co 
2S 3 mes 11.125) 
= —., (11. 
neg HN +1) 
where the prime on the sum indicates that the terms for n = 0 and n = —1 are to be omit- 
ted. The derivation of Eq. (11.123) indicates that this equation will apply if we omit the 
(singular) n = 0 and n = —1 terms from the sum and include the points z = 0 and z = —1 
as points where the residues of f(z) cotzz are to be included. 
Based on that insight, we find that in the present problem, 
2S = —(sum of residues of x cotwz/z(z + 1) at z=0 and z= —1). 
The singularities at z = 0 and z = —1 are second-order poles, at which the residues are 


most easily computed by the method illustrated in item 5 of Example 11.7.1. In Exer- 
cise 11.7.2 it is shown that the residue at each pole has value —1. Completing the problem, 


2S=—(-1-1)=2, so S=1. 
In this instance the result is easily verified by making the partial fraction expansion 
1 1 1 
n(n + 1) = n n+l 
When inserted in the summation S, all terms cancel except the initial term of the 1/n 





summation, yielding S = 1. a 
Exercises 
11.9.1 Show that if f(z) is analytic at z=zo and g(z) has a simple pole at z=zg with residue 
bo, then f (z)g(z) also has a simple pole at z=zo, with residue f (zo)bo. 
11.9.2. Show that cot z has magnitude of order 1 for large |z| when not extremely close to one 


of its poles and does not affect the limiting behavior of Jy. 





11.9.3 
11.9.4 
11.9.5 


11.9.6 


11.9.7 


11.9.8 


11.10 
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Evaluate js — 35 + a3 —--°- 


00 1 
Evaluate ) on 1 5 4D" 


Evaluate )°°° aye ae , where a is real and not an integer. 


(a) Using a method based on contour integration, evaluate }°?- 5 aan 
(b) Check your work by relating your answer to an appropriate expression involving 


zeta functions. 


1 1 4 1 a 
cosh(t/2)  3cosh@x/2) * Scosh(Sx/2) =" ge 


eit 1yn S32 = f(y? — 27). 


Show that 








For —1 < y < +2, show that }°°° 


MISCELLANEOUS TOPICS 


Schwarz Reflection Principle 


Our starting point for this topic is the observation that g(z) = (z — xo)” for integral n and 
real xo satisfies 


g*(z) =[(z — xo)" J* = (2* — x0)” = g(2*). (11.126) 
A generalization of the result in Eq. (11.126) is the Schwarz reflection principle: 


Ifa function f(z) is (1) analytic over some region including a portion of the real axis 
and (2) real when z is real, then 


FO=fe). (11.1375 


Expanding f(z) about some point xo within the region of analyticity on the real axis, 


f@=>-G- ry = 


n=0 


Since f(z) is analytic at z = xo, this Taylor expansion exists. Since f(z) is real when z is 
real, f) (xo) must be real for all n. Then, invoking Eq. (11.126), the Schwarz reflection 
principle, Eq. (11.127), follows immediately. This completes the proof within a circle of 
convergence. Analytic continuation then permits the extension of this result to the entire 
region of analyticity. 

Note that the reflection principle can also be derived by the consideration of Laurent 
expansions. See Exercise 11.10.2. 


Mapping 


An analytic function w(z) = u(x, y) + iv(x, y) can be regarded as a mapping in which 
points or curves in an xy plane can be associated with the corresponding points or curves in 
auv plane. As a relatively simple example, consider the transformation w = 1/z. From an 
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examination of its polar form, with z = re'®, w = pe!”, we see that p = 1/r and gy = —0, 
leading to the conclusion that the interior of the unit circle maps into its exterior (see 
Fig. 11.33). Circles in other locations in the z plane are transformed by w = 1/z into 
other circles (or straight lines, which can be thought of as circles of infinite radius). This 
statement is the subject of Exercise 11.10.6. The transformation of two such circles are 
shown in the four panels of Fig. 11.34. Compare the way in which the interiors of the 
circles transform in Figs. 11.33 and 11.34. Note that the transformation does not preserve 
lengths, as can be seen in the figure from the labeling of various points and their locations 
when mapped. 





FIGURE 11.33 Mapping w = 1/z. The shaded areas transform into each other. 





























FIGURE 11.34 Left panels: circles in z plane. Right panels: their transformations 
in w plane under w = 1/z. 
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Historically, the notion of mapping was useful for identifying and carrying out transfor- 
mations that would facilitate the solution of 2-D problems in electrostatics, fluid dynamics, 
and other areas of classical physics. An important aspect of such mappings is that they are 
conformal, meaning that (except at singularities of the transformation) the angles at which 
curves intersect remain unchanged when transformed. This feature preserves relations, e.g., 
between equipotentials and lines of force (stream lines). With the nearly universal use of 
high-speed computers, procedures based on conformal mapping are no longer central to 
the practical solution of most physics and engineering problems, and as a consequence 
will not be explored here in further detail. For problems where these techniques are still 
relevant, we refer the reader to earlier editions of this book and to sources identified under 
Additional Readings. In that connection, we call particular attention to the book by Spiegel, 
which contains (in chapter 8) descriptions of a large number of mappings and (in chapter 9) 
many applications to problems of fluid flow, electrostatics, and heat conduction. 


Exercises 


11.10.1 


11.10.2 


11.10.3 


11.10.4 


A function f(z) = u(x, y) +iv(x, y) satisfies the conditions for the Schwarz reflection 
principle. Show that 


(a) wuisaneven function of y. (b) vis an odd function of y. 


A function f(z) can be expanded in a Laurent series about the origin with the coeffi- 
cients a, real. Show that the complex conjugate of this function of z is the same function 
of the complex conjugate of z; that is, 


F°() = fe"). 
Verify this explicitly for 


(a) f(z)=z",naninteger. (b) f(z) =sinz. 
If f(z) =iz(a; =), show that the foregoing statement does not hold. 


The function f(z) is analytic in a domain that includes the real axis. When z is real 
(z =x), f(x) is pure imaginary. 


(a) Show that 


fR)=-[f@F. 


(b) For the specific case f(z) =iz, develop the Cartesian forms of f(z), f(z*), and 
f*(z). Do not quote the general result of part (a). 


How do circles centered on the origin in the z-plane transform for 
1 1 
(a) wi@)=Zz+ a (b) wa@)=z—— for z #0? 


What happens when |z| > 1? 
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11.10.5 


11.10.6 


11.10.7 








What part of the z-plane corresponds to the interior of the unit circle in the w-plane if 
@) wal) (&) wai 
a w= i w= é 

z+1 zt+i 


(a) Writing z=x+iy, w=u-+iv, show that if w = 1/z, the circle in the xy plane 
defined by (x — a)? + (y — b)? =r? transforms into (u — A)? + (v — B)? = R?. 

(b) Does the center of the circle in the z plane transform into the center of the corre- 
sponding circle in the w plane? 


Assume that a curve in the xy plane passes through point zo in the direction dz = eds, 
where s indicates arc length on the curve. Then, if w = f(z), with f(z) analytic at z= 
zo, we have dw = (dw/dz)dz = f'(z)e!°ds, where dw is in the direction the mapping 
of the xy curve passes through wo = f (zo) in the w plane. Use this observation to 
prove that if f’(zo) 4 0, the angle at which two curves intersect in the z plane is the 
same (both in magnitude and direction) as the angle of intersection of their mappings in 
the w plane. 


Additional Readings 


Ahlfors, L. V., Complex Analysis, 3rd ed. New York: McGraw-Hill (1979). This text is detailed, thorough, rigor- 
ous, and extensive. 


Churchill, R. V., J. W. Brown, and R. F. Verkey, Complex Variables and Applications, 5th ed. New York: 
McGraw-Hill (1989). This is an excellent text for both the beginning and advanced student. It is readable and 
quite complete. A detailed proof of the Cauchy-Goursat theorem is given in Chapter 5. 


Greenleaf, F. P., Introduction to Complex Variables. Philadelphia: Saunders (1972). This very readable book has 
detailed, careful explanations. 

Kurala, A., Applied Functions of a Complex Variable. New York: Wiley (Interscience) (1972). An intermediate- 
level text designed for scientists and engineers. Includes many physical applications. 

Levinson, N., and R. M. Redheffer, Complex Variables. San Francisco: Holden-Day (1970). This text is written 
for scientists and engineers who are interested in applications. 

Morse, P. M., and H. Feshbach, Methods of Theoretical Physics. New York: McGraw-Hill (1953). Chapter 4 is 
a presentation of portions of the theory of functions of a complex variable of interest to theoretical physicists. 

Remmert, R., Theory of Complex Functions. New York: Springer (1991). 

Sokolnikoff, I. S., and R. M. Redheffer, Mathematics of Physics and Modern Engineering, 2nd ed. New York: 
McGraw-Hill (1966). Chapter 7 covers complex variables. 

Spiegel, M. R., Complex Variables, in Schaum’s Outline Series. New York: McGraw-Hill (original 1964, 
reprinted 1995). An excellent summary of the theory of complex variables for scientists. 

Titchmarsh, E. C., The Theory of Functions, 2nd ed. New York: Oxford University Press (1958). A classic. 

Watson, G. N., Complex Integration and Cauchy’s Theorem. New York: Hafner (original 1917, reprinted 1960). 
A short work containing a rigorous development of the Cauchy integral theorem and integral formula. Appli- 


cations to the calculus of residues are included. Cambridge Tracts in Mathematics, and Mathematical Physics, 
No. 15. 


12.1 


CHAPTER 12 


FURTHER TOPICS IN 
ANALYSIS 


The broader perspective and additional tools made available through complex variable 
theory enable us to consider fruitfully a number of topics in analysis that have wide appli- 
cation in areas of relevance to physics. In this chapter we survey several such topics. 


ORTHOGONAL POLYNOMIALS 


Many physical problems lead to second-order differential equations corresponding to 
Sturm-Liouville problems, and often the solutions of interest in physics are polynomials, 
defined on a range and with weighting factors that make them eigenfunctions of Hermitian 
problems. A number of interesting features of such problems can be approached with the 
aid of complex variable theory. 


Rodrigues Formulas 


Odile Rodrigues showed that a large class of second-order Sturm-Liouville ordinary dif- 
ferential equations (ODEs) had polynomial solutions which could be put in a compact and 
useful form now generally called a Rodrigues formula. While such formulas could be pre- 
sented case by case with an aura of coincidence or mystery, the approach we take here is 
to develop them from a general viewpoint, after which we can proceed to more detailed 
discussion of well-known special cases. 

Consider a second-order Sturm-Liouville ODE of the general form 


p(x)y" +q(x)y' +ay =0, (12.1) 
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with p(x) and q(x) restricted to the polynomial forms 
p(x) =ax*+ Bxt+y, q(x) =px+v. (12.2) 


The forms of p and gq are sufficiently general to include most of the ODEs with classi- 
cal sets of polynomials as solutions (the Legendre, Hermite, and Laguerre ODEs, among 
others). When Eq. (12.1) has as a solution a polynomial of degree n, we can write 


n 
yn(x) =) gjx!, (12.3) 
j=0 
with coefficient g,, nonzero. Setting to zero the coefficient of x” when y, is inserted into 
the ODE, we have 

n(n — l)agn +npgn +Agn =O, (12.4) 

showing that the eigenvalue A,, which corresponds to y, must have the value 
An = —n(n— Ia—nyp. (12.5) 


In Chapter 7 we identified an ODE of the form of Eq. (12.1) as self-adjoint if p’(x) = 
q(x), and also showed that if an ODE was not already self-adjoint as written, it could be 
converted to self-adjoint form by multiplying all its terms by a weight factor w(x), which 
must be such that 


} 








(wp) =wq, or w=wi? (12.6) 
p 
As shown previously, this equation is separable and has solution 
x 
w(x) = p7lexp [Ra (12.7) 
P(x) 
The introduction of w enables the ODE to assume the form 
d 
—[ w(x) p(x)y'] +Aw(x)y = 0, (12.8) 


dx 


which was useful for discussing orthogonality properties of its solutions. 

Our current interest in w(x), however, is in the observation by Rodrigues that its par- 
ticular form permits the solutions y, (x) to be written in the compact and interesting form 
that is now called its Rodrigues formula: 


t (ad\" ‘ 
n= (2) [ wp(x)"]. (12.9) 


The proof of Eq. (12.9) is both simple and ingenious. Using the defining condition for 
w(x), Eq. (12.6), we first obtain 


p[wp"] = wp"[(n- Dp’ +4]. (12.10) 
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We then differentiate this equation n + 1 times and divide by w. Because p is only 
quadratic in x and q is linear, application of Leibniz’s formula to the multiple differen- 
tiations leads to only three terms on the left-hand side and two on the right: 


afar. - wele fe"... wane fay". « 
(F) luo] + EPP (ZY "fut + SLOP (ZY two 


W W 


—1)p’ nt+1 _ " ! n 
_ fn prt (Z| [wot] + Ct Die 1)p eo) ( 2)" [wo"l, 


w dx w dx 








(12.11) 


Our objective is to manipulate Eq. (12.11) into a form showing that y, as given in 
Eq. (12.9) is a solution to the ODE of Eq. (12.1). We start by identifying the terms with y, 
where that is possible, and then, combining or canceling similar terms, we reach 


a 3 ee a ie ie ia n2>—n—2 
2( ) [we"]+— 4(<) [wo"]-|S—p" + + g'] =o 


w \dx w dx 2 





(12.12) 


To complete our analysis we now need to move the factors 1/w so that only n differenti- 
ations appear to their right, enabling identification of the remaining terms of the equation 
with y, or its derivatives. We note the identity 


E(w (3) ov 9 (5) (2) 
p (S-) (ZY torr. 


which reduces, using Eq. (12.6), to 


2 ! n+1 / 
p(a\"" 2q-p') (4 q-P 
( ) [wp"] = py, + ———— ( — [wp"]—| p” —4q’ Ves 











w \dx 





w dx Dp 
(12.13) 
Substituting Eq. (12.13) into Eq. (12.12), some further simplification results: 
aye" n>—n — p' 
py +£() [wp] - plang =(4—P)|,,20 214) 
w \dx 2 Dp 
Our final step is to use the identity 
ce aa , 4(q~P’) 
Bf = 3S ye. 12.15 
c (=) [wp"] = 4, — (12.15) 


which brings us to 


n2—n 





py, +ay) -| p'+na'| yn =0. (12.16) 





554 Chapter 12 Further Topics in Analysis 


Noting that p” = 2a and q’ = 1, we confirm that y, is a solution of Eq. (12.1) with the 
eigenvalue given in Eq. (12.5). 

Finally, we need to show that Rodrigues’ formula, Eq. (12.9), results in an expression 
that is a polynomial of degree n. We note that a typical term of that formula will in- 
volve a j-fold differentiation of w and an (n — j)-fold differentiation of p”. After the 
differentiation of p”, we are left with p/ times a polynomial. The differentiation of w 
will, applying Eq. (12.6), leave (w/p/) times a polynomial, and the numerator and de- 
nominator factors p/ cancel. In addition, the w from the differentiation cancels against 
the initial factor w~!, leaving each term of yy, in polynomial form. When all terms of y, 
are combined, the resulting polynomial must have the degree consistent with Eq. (12.5), 
namely n. 


Example 12.1.1 RODRIGUES FORMULA FOR HERMITE ODE 


The Hermite ODE is 
y"—2xy’+ay=0, or py" +qy'’+ay=0 


with p = 1, q = —2x. We easily find 
x 


w =exp [vas a 


The Rodrigues formula is therefore (with a factor (—1)” to obtain the Hermite polynomials 
with their conventional signs) 


—_ = d ‘i n| _ n x? d : —x? 
Ya) = — (=) [wp"]| =(-1)"e (=) ae (12.17) 





Schlaefli Integral 


One of the nice features of the Rodrigues formulas is that the multiple differentiations can 
be converted to a convenient form by use of Cauchy’s integral formula. Using Eq. (11.33), 
we have 


1 al w(z)[p(z)]" 
w(x) 270i (z—x)rtl 
Cc 





ne) = dz, (12.18) 


where the contour C encloses the point x, and must be such that w(z)[p(z)]” is analytic 
everywhere on and within C. This formula is known as the Schlaefli integral for y, (x). 

It is possible to introduce the Schlaefli integral as the definition of a set of functions y, 
and, from that definition, prove that y, is a solution to the corresponding ODE. Because 
we created the Schaefli integral to represent a function already known to be a solution, 
verification that it solves the ODE becomes redundant. 
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Generating Functions 


Many sets of functions arising in mathematical physics can be defined in terms of gener- 
ating functions. Such functions include, but are not limited to the orthogonal polynomials 
yn that have been the subject of our discussion of Rodrigues formulas. For now, we make 
no assumptions as to the source of the functions involved. 

If f, (x) is a set of functions, defined for integer values of the index n, it may be the case 
that the f, (x) can be described as the coefficients of the powers of an auxiliary variable, fr, 
in the expansion of a function g(x, t), which is called a generating function: 


g(x. => en fae)”. (12.19) 


The range of n may be semi-infinite, with n > 0, thereby describing a Taylor series, or it 
may extend from —oo to +00, thus describing a Laurent series. The additional coefficient, 
Cyn, permits adjustment of the function set to an agreed-upon scaling. Different choices of 
Cn Will also lead to different generating functions g(x, f) for the same set of fy. 

Applying the residue theorem, we can see that the generating function expansion is 
closely related to contour integral representations of the functions f;,: 





1 g(x,t) 
Cn fn) = af mat at (12.20) 
where the contour encircles t = 0 but no other singularities of the integrand (with respect 
to t). 

A generating function may be regarded as providing the definition of a function set 
n(x), or alternatively it may have been obtained as the encapsulation of the f;, which 
were already defined in some other way (e.g., as polynomial solutions of a Sturm-Liouville 
ODE). We shall later take up the issue of obtaining generating functions for previously 
specified f,,, focusing for now only on ways in which they can be used. 

It is obvious that by explicitly evaluating the implied expansion one can extract the 
members of a function set from its generating function. However, a more important feature 
of generating functions is that they can be very useful in deriving relationships between 
members of the set f,. For example, 


a 2 — 
ac t) = Donen fal 1) = Le + Deaaihwi@e 





and if we can relate g and dg/dt we have a corresponding relation between f, and fy+1. 
Relations between the f(x) and their derivatives f/ (x) can be deduced by differentiating 
g(x, t) with respect to x. 


Example 72.1.2 HERMITE POLYNOMIALS 


A generating function formula for the Hermite polynomials H,,(x) (at their conventional 
scaling) is 


CO 
t” 
gr = ye, (x). (12.21) 
n=0 ; 
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To develop a recurrence formula connecting H,, of contiguous index values, we compute 


a pr! 


[o,@) 
Oe Pts = (2x _ die? te = Son Hy (x) 
n=0 





(1299) 


ni 


ot 
Expanding the exponential in the central member of Eq. (12.22) (and suppressing tem- 
porarily the argument of H,,), 


put oO pro} 


lee) 1" oo 
2x Fn = > 2A, = So nHn 
n=0 n=0 n=0 


Extracting the coefficient of t” from each of these summations, we reach (for each n) 
2x Ay 2An-1 = (n+ 1)An+1 
n! (n—1)! 3 =(ntD! ’ 








nt - 





which reduces to 
2x Ay (x) — 2n Ay—1(x) = An41 (x). (12.23) 


Equation (12.23) is called a recurrence formula; it permits the construction of the en- 
tire series of H, from starting values (typically Hp and H), which are easily computed 


directly). 
A derivative formula can be obtained by differentiating Eq. (12.21) with respect to x. 
We have 
dO _P424x —1?42tx = re yth 
a = 2te =) A). 


n=0 


Substituting Eq. (12.21) into the central member of this equation, we get 





oo pnt o° t 
> 2A (x) —— = DTH), 
n=0 n=0 
which leads directly to 
2nH,—1) = A, (x). (12.24) 


In later chapters we illustrate the application of these ideas to a variety of special func- 
tions; in the next section of this chapter we apply them to a generating function that leads 
to quantities known as Bernoulli numbers. 


Finding Generating Functions 


To take generating functions out of the realm of magic, we next consider how they might 
be obtained. For a more or less arbitrary function set, this question has been a topic of 
current interest in mathematical research, with methods of several sorts devised during the 
past century by Rainville, Weisner, Truesdell, and others. See the works by McBride and 
Talman in Additional Readings. 
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For sets of polynomials arising in Sturm-Liouville problems and described by Rodrigues 
formulas, we can be more explicit. Using the Schlaefli integral, Eq. (12.18), we can form 


2 nM! f w(z)[p(z)" 
A= c 


oe) 





dz. (12.25) 


Recall that C encloses x and that wp” must be analytic throughout the region within the 
contour. 

In principle Eq. (12.25) can be evaluated to obtain g(x, ft), for example by choosing C 
to be such that the summation can be brought inside the z integral and (after specifying 
Cn) evaluating first the sum and then the contour integral. In practice the difficulty of doing 
this may depend on the problem, including the choice of cy. We provide one example of 
the process. 


Example 12.1.3 Lecenpre POLYNOMIALS 
We use the formal process described above to obtain a generating function for the Legendre 
polynomials. The Legendre ODE is of the form discussed in Eq. (12.1), 
(1 —ax7)y” —2xy’ + 2y=0, 
implying that 
pQX)=1—x°, q(x) =—2x, 


and the equation is, as written, self-adjoint, so w(x) = 1. From the generating-function 
formula based on the Schlaefli integral, Eq. (12.25), we choose cy, = (—1)"/2"n!, thereby 


reaching 
oo nan 2)n 
g(x, t= y (‘ ue ) is ae dz 


Ny\ : — yyntl : 
= 2"n! 2Q0i J (z— x) 





Interchanging the summation and integration (which we will justify later), the factors de- 
pendent on n form a geometric series, which we can sum: 


y (Sy i 1 
2G=%)) €=x% ge 1@ s(t 


n=0 
a » 2% atl 
= Zz oe . 
t t 








Inserting this result into the formula for g(x, t), we now have 
epjau? f 2_ 22 2x=t ra 
XxX, => 
2 t 207i . t t 
C 


21 § dz (12.26) 
t2ni J (2-—z)(<-22)’ , 
C 
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where z, and zz are the roots of the quadratic form in the first line of the equation: 








1 VJV1l—2xt+?2 1 VJV1l—2xt+?2 
A= ; 2==sF : 
t t t t 


In order for Eq. (12.26) to be valid, it must have been legitimate to interchange the sum- 
mation and integration, which is the case only if the summation is uniformly convergent 
(with respect to z) for all points at which it is used (i.e., everywhere on the contour C). It is 
convenient to analyze the convergence for small ¢ and x and for a contour with |z| = 1. 
Once a final formula has been obtained, its range of validity can be extended by appeal to 
analytic continuation. 

On the assumed contour and for small x, there will be a range of |t| « 1 for which 


(22 — 1)t 


——_ | < 1, 
2(z — x) 








guaranteeing convergence of the geometric series. We now return to the evaluation of the 
contour integral in Eq. (12.26). It has two poles, at z = z; and z = z2. For small x and |t|, 
z2 will be approximately 2/t and will be exterior to the contour, while z; will be close 
to the origin of z. Thus, only the residue of the integrand at z = z; will contribute to the 
contour integral, which will have the value 


2 1 
g(x, t)h=—— . 
t 271-22 





Since 


2 
zy —z2 = ——V1—2xt4+2?, 


t 


we obtain the Legendre polynomial generating function as 


1 
(x,t) = ———_.. 12.27 
? V1 —2xt +t? ( ) 
| 


Summary—Orthogonal Polynomials 


For five classical sets of orthogonal polynomials, we summarize in Table 12.1 their ODEs, 
Rodrigues formulas, and generating functions. Omitted from the list are important sub- 
sidiary polynomial sets (e.g., those connected with the associated Legendre and associated 
Laguerre ODEs). 


Exercises 


12.1.1 Starting from the Rodrigues formula in Table 12.1 for the Hermite polynomials Hy, 
derive the generating function for the H,, given in that table. 
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Table 12.1 Orthogonal Polynomials: ODEs, Rodrigues Formulas, and Generating 
Functions 





Rodrigues Formula Generating Function 








Legendre: (1 x)" 2xy! tn(n+ lhyy=0 











1 (d\" = 
Pa) = 5 (+) (x? — 1)" (—2xt472) 1/2 = e Py(x)t” 
2"n! \ dx aa 
Hermite: y” —2xy’ +2ny =0 
Ha(x) =(—D)"e™ oy eo tat s a (x)t” 
n = dx = n! n 
n= 
Laguerre: xy” +(1—x)y’ +ny =0 
ee d\" oe e—Xt/(—-t) oe ‘ 
Enc = (ae) te") tag ea 
n=0 
ChebyshevI: (1 x2)y" xy! 4 ny =0 
eDiae paN" eis it? . n 
T, = 1-x =T + 2 T, t 
n(x) Gap Vay) 8?) iy Ye Tr) 


n=1 


Chebyshev II: (1 x7)" 3xy’ +n(n+2)y =0 


(-1)"(n+ 1) d\" 2\n+1/2 1 seu ‘ 
n = 1 oe ee eenony aon oe n 
Un) = Grae nid x72 (<) ei 1—2xr+92 2 (ay 











12.1.2 (a) Starting from the Laguerre ODE, 
xy" +(1—x)y' tay =0, 


obtain the Rodrigues formula for its polynomial solutions L,,(x). 
(b) From the Rodrigues formula, scaled as in Table 12.1, derive the generating func- 
tion for the L,(x) given in that table. 


12.1.3. Carry out in detail the steps needed to confirm that the (n + 1)-fold differentiation of 
Eq. (12.10) leads to Eq. (12.12). 


12.1.4 Confirm the algebraic steps that convert Eq. (12.12) into Eq. (12.16). 


12.1.5 Given the following integral representations, in which the contours encircle the origin 
but no other singular points, derive the corresponding generating functions: 


(a) Bessel functions: 
Jae) = pha f oP 
wi 
(b) Modified Bessel functions: 


1 
n(x) = sar pn ae 
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12.1.6 


12.1.7 


12.2 


Expand the generating function for the Legendre polynomials, (1 — 2tz + t7)~!/, in 


powers of t. Assume that f is small. Collect the coefficients of f°, t!, and 17. 
ANS. dao= Po(z) = 1, 
a= P\(z) =z, 
aj= P2(z) = 5 (32° —1). 


The set of Chebyshev polynomials usually denoted U,,(x) has the generating-function 
formula 


1—2xt + 12 -> Un(xye". 
n= 


Derive a recurrence formula (for integer n > 0) connecting three U,, of consecutive n. 


BERNOULLI NUMBERS 


A generating-function approach is a convenient way to introduce the set of numbers first 
used in mathematics by Jacques (James, Jacob) Bernoulli. These quantities have been de- 
fined in a number of different ways, so extreme care must be taken in combining formulas 
from works by different authors. Our definition corresponds to that used in the reference 
work Handbook of Mathematical Functions (AMS-55). See Additional Readings. 

Since the Bernoulli numbers, denoted B,, do not depend on a variable, their generating 
function depends only on a single (complex) variable, and the generating-function formula 
has the specific form 





Bat 
= ~ ; (12.28) 
n! 
n=0 


The inclusion of the factor 1/n! in the definition is just one of the ways some definitions of 
Bernoulli numbers differ. We defer for the moment the important question as to the circle 
of convergence of the expansion in Eq. (12.28). 

Since Eq. (12.28) is a Taylor series, we may identify the B, as successive derivatives of 


the generating function: 
d" t 
B,=|— : 12.29 
: E (FH), 


To obtain Bo, we must take the limit of t/(e’ — 1) ast — 0, easily finding By = 1. Applying 
Eq. (12.29), we also have 


z d t i 1 te! 1 (12.30) 
= hm => p . 
DT de Ver go (Oke 1 =P 2 
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In principle we could continue to obtain further B,,, but it is more convenient to proceed 
in a more sophisticated fashion. Our starting point is to examine 











ye ee ee eee ee 
“ont et naar P= | 2 
ieee one 12.31 
~ erat 1a ee 
where we have used the fact that 
t =f 
Fr ta (12.32) 


Equation (12.31) shows that the summation on its left-hand side is an even function of f, 
leading to the conclusion that all B, of odd n (other than B,) must vanish. 


We next use the generating function to obtain a recursion relation for the Bernoulli 
numbers. We form 








esl 7 =. et i yn 
f eal Do to! at) 7 '0n)! 
CO 
I i 
=4 ie 
De Fess: =| 


ora) <N/2 B 
N 2n 
t 
te 2» (2n)!(N — 2n+ 1)! 


n=1 








oo 7 <N/2 
t N-1 N+1 
24 Bo, |. (12.33 
+) wD a + (O) Gor 


Since the coefficient of each power of ¢ in the final summation of Eq. (12.33) must vanish, 
we may set to zero for each N the expression in its square brackets. Changing N, if even, 
to 2N and if odd, to 2N — 1, Eq. (12.33) leads to the pair of equations 


N 
1 2N+1 
N- == B 
; >( Bon 


(12.34) 


Either of these equations can be used to obtain the B2, sequentially, starting from Bz. The 
first few B,, are listed in Table 12.2. 


To obtain additional relations involving the Bernoulli numbers, we next consider the 
following representation of cott: 


cost. , fel! tet fet +i ; 2 
ON a aes) ee 1+ ST] : 
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Table 12.2 Bernoulli Numbers 








n Bn Bn 
0 1 1.000000000 
1 
1 =5 —0.500000000 
1 
2 . 0.166666667 
1 
4 es —0.033333333 
30 
1 
6 a 0.023809524 
42 
1 
8 = —0.033333333 
30 
10 & 0.0757 57576 





Note. Further values are given in AMS-55; 
see Abramowitz in Additional Readings. 


Multiplying by ¢ and rearranging slightly, 


2it (2it)2" 
2 + Ra Says 2" On) 


tcott= 
n=0 
2n 
=D 1)" Bon oe (12.35) 


where the term 2it/2 has canceled the B, term that would otherwise appear in the expan- 
sion. 

Now that we have our Bernoulli-number expansion identified with t cott, we can see 
that it represents a function with singularities (poles) at t = mz, where m = +1, +2,.... 
There is no singularity at t = 0 (due to the presence of the factor ¢), so the singularity 
nearest the expansion point (the origin) is at || = 2. Since the argument in the expansion 
is 2t, we conclude that the generating series for the Bernoulli numbers, Eq. (12.28), will 
have the radius of convergence |2t| = 27. This observation is, of course, consistent with 
the fact that the zeros of e' — 1 are for ¢ at integer multiples of 277i. 

To obtain another representation of the Bernoulli numbers, we write B, using the 
contour-integration formula, Eq. (12.20). Noting that for use in this equation cy fn(x) = 


B,/n!, we have 
ee ee 12.36 
"Oni J ef —1 18+!’ Ae 

















where the integral is a circle within the radius of convergence of the generating series. We 
can, at least in principle, evaluate the integral using the residue theorem. For n = 0 we 
have a simple pole with a residue of +1, and 


0! 
Bo= —~ : 2mi(+l) =1. 
2n1 
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FiGURE 12.1 Contour of integration for Bernoulli numbers. 


For n = | the singularity at t = 0 becomes a second-order pole, and the limiting process 
prescribed by Eq. (11.68) yields the residue -}, so 


1! ft i 1 
|e a=. 
Qni 5 5 


consistent with our previous result. For n > 2 the poles at t = 0 are of increasing order and 
this procedure becomes rather tedious, so we resort to a different approach. We deform 
the contour of our integral representation as shown in Fig. 12.1, which differs from the 
original circular contour in that it surrounds all the poles of the integrand at t = +27mi, 
m= 1,2,..., while avoiding the inclusion of the pole at t = 0. In contrast to the high-order 
pole at t = 0, the other poles are all first-order, with residues that are easily evaluated. 

To use the new contour, we need to identify the contributions from its constituent parts. 
The direction of travel around the contour causes the small circle about t = 0 to contribute 
+2z7i times the residue of the integrand at t = 0, 1.e., the result that when multiplied by 
n!/2z7i is equal to B,. The remainder of the contour makes no contribution to the integral: 
(1) Because the integrand is analytic along the real axis and there is no branch cut there, 
the segments A and A’, which are in opposite directions of travel, cancel; and (2) the 
large circle contributes negligibly (for n > 2) because at large |t| the integrand behaves 
asymptotically as 1/|r|". Noting that the poles at nonzero rf are encircled in a clockwise 
sense, we have the following relatively simple result (for n > 2): 





= 





B, =-— Qni (resiaues of 


- at polest 40). (12.37) 
Qi 1 


ef — 


Since the residue at t = 27 mi is simply (27mi)~, Eq. (12.37) becomes 
[o,@) 


n! 1 1 
By = (Qmi)” 2» E ag = , 


m=1 
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which further reduces, for 2n > 2, to 


[ee 


Hey ey ra, 


(27)2” = m (27)2" (12.38) 








Bon+1 =0. 


Note that the B, of oddn > 1 are correctly shown to vanish, and that the Bernoulli numbers 
of even n > 0 are identified as proportional to Riemann zeta functions, which first appeared 
in this book at Eq. (1.12). We repeat the definition: 

aa | 


f@)=) 


m=1 





Equation (12.38) is an important result because we already have a straightforward way 
to obtain values of the B,, via Eq. (12.34), and Eq. (12.38) can be inverted to give a closed 
expression for ¢(2n), which otherwise was known only as a summation. This representa- 
tion of the Bernoulli numbers was discovered by Euler. 

It is readily seen from Eq. (12.38) that | B2,| increases without limit as n + oo. Numeri- 
cal values have been calculated by Glaisher.! Illustrating the divergent behavior of the 
Bernoulli numbers, we have 


Bog = —5.291 x 10° 
Ba99 = —3.647 x 107. 
Some authors prefer to define the Bernoulli numbers with a modified version of Eq. (12.38) 
by using 
2(2n)! 
n= (27r)2" 


the subscript being just half of our subscript and all signs positive. Again, when using other 
texts or references, you must check to see exactly how the Bernoulli numbers are defined. 

The Bernoulli numbers occur frequently in number theory. The von Staudt-Clausen the- 
orem states that 





¢(2n), (12.39) 


1 1 1 1 
Bon = An vas , (12.40) 
Pl P2- P3 Pk 
in which A, is an integer and pj, p2,..., px are all the prime numbers such that p; — 1 is 
a divisor of 2n. It may readily be verified that this holds for 





Bo (A3=1, p=2,3,7), 
Bg (Aq = 1, p=2,3,5), 
By (As=1, p=2,3,11), 


and other special cases. 


'}. W. L. Glaisher, table of the first 250 Bernoulli numbers (to nine figures) and their logarithms (to ten figures). Trans. Cam- 
bridge Philos. Soc. 12: 390 (1871-1879). 
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The Bernoulli numbers appear in the summation of integral powers of the integers, 
N 
> j’, p integral, 
j=l 


and in numerous series expansions of the transcendental functions, including tanx, cotx, 
In| sin x|, (sinx)~!, In |cosx|, In| tan x], (coshx)~!, tanh.x, and cothx. For example, 
3 n—152n(92n 
x 2 5 (—1)2°2°" (2°" — 1) Bon on _-4 
alae” ae Fa: Se Qn)! x 
The Bernoulli numbers are likely to appear in such series expansions because of the defi- 
nition, Eq. (12.28), the form of Eq. (12.35), and the relation to the Riemann zeta function, 
Eq. (12.38). 





hase, COAT) 


Bernoulli Polynomials 


If Eq. (12.28) is generalized slightly, we have 





tet 60 tn 

= =D Bn) (12.42) 
n=0 

defining the Bernoulli polynomials, B,,(s). It is clear that B,(s) will be a polynomial of 

degree n, since the Taylor expansion of the generating function will contain contributions 

in which each instance of t may (or may not) be accompanied by a factor s. The first seven 

Bernoulli polynomials are given in Table 12.3. 

If we set s = 0 in the generating function formula, Eq. (12.42), we have 


Bn(0)= Bn, n=0,1,2,..., (12.43) 


showing that the Bernoulli polynomial evaluated at zero equals the corresponding 
Bernoulli number. 


Table 12.3 Bernoulli Polynomials 











Bo=1 
Fee ee 
2 
Rye ae 
6 
3_ 32,1 
B3=x ee ae 
49,3442. | 
Ba=x 2x? +x 
30 
5 4 ; 53 1 
Bs =x? ae x E* 





Bo = x® — 3x? 4 5% x4 
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Two other important properties of the Bernoulli polynomials follow from the defining 
relation, Eq. (12.42). If we differentiate both sides of that equation with respect to 5s, we 
have 
2 ts oo n 
te j t 
gai dB) 
n=0 
60 pnt oo t 
= 2 Bl = Bri OG (12.44) 
n=0 n=1 
where the second line of Eq. (12.44) is obtained by rewriting its left-hand side using the 
generating-function formula. Equating the coefficients of equal powers of f in the two lines 
of Eq. (12.44), we obtain the differentiation formula 
d 
Fy Bn) = 2 Bn-1 (8), n=1,2,3,.... (12.45) 
s 
We also have a symmetry relation, which we can obtain by setting s = 1 in Eq. (12.42). 
The left-hand side of that equation then becomes 
ee (12.46) 
e—-1 et] , 
Thus, equating Eq. (12.42) for s = 1 with the Bernoulli-number expansion (in —t) of the 
right-hand side of Eq. (12.46), we reach 
[o,@) lo) 
aL (—t)" 
VO =) Br 
n=0 n=0 
which is equivalent to 
B, (1) = (—1)"B, (0). (12.47) 
These relations are used in the development of the Euler-Maclaurin integration formula. 
Exercises 


Verify the identities, Eqs. (12.32) and (12.46). 


12.2.2 Show that the first Bernoulli polynomials are 


Bo(s) =1 
Bi(s)=s—5 
Bo(s) asap é 


Note that B, (0) = B,, the Bernoulli number. 
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12.2.3 Show that 


(ee) 


_4yn—192n(52n _ 
tanx = SS So ae eC V)Bon xen se 
(2n)! 





<xX< t=. 


N| 
wla 


n=1 


Hint. tanx = cotx — 2cot2x. 


12.3. EULER-MACLAURIN INTEGRATION FORMULA 


One use of the Bernoulli polynomials is in the derivation of the Euler-Maclaurin integra- 
tion formula. This formula is used both to develop asymptotic expansions (treated later in 
this chapter) and to obtain approximate values for summations. An important application 
of the Euler-Maclaurin formula, presented in Chapter 13, is its use to derive Stirling’s 
formula, an asymptotic expression for the gamma function. 

The technique we use to develop the Euler-Maclaurin formula is repeated integration by 
parts, using Eq. (12.45) to create new derivatives. We start with 


1 1 
[ feodx = f reoBooas, (12.48) 
0 0 


where we have, for reasons that will shortly become apparent, inserted the redundant factor 
Bo(x) = 1. From Eq. (12.45), we note that 


Bo(x) = By (x), 


and we substitute Bj (x) for Bo(x) in Eq. (12.48), integrate by parts, and identify B, (1) = 
—B,(0) = 7 thereby obtaining 


1 1 
[ feodx = reapavy - FB,0)= | fF OBeOdx 
0 0 


1 
1 
= ms f(0)] - f feoBundx. (12.49) 
0 


Again using Eq. (12.45), we have 


Ly 
B(x) = 5 Bal). 
Inserting B(x) and integrating by parts again, we get 
[ F/B) — f')B2(0) | 


| i 1 0 : 
[ feoax= [r+ 20] - 5 
0 


1 


! f -@) 
+5 J FO W)Brx@)ax. (12.50) 
0 
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Using the relation 
Bon (1) = Bon (0) = Bay, n=0,1,2,..., (12.51) 


Eq. (12.50) simplifies to 


1 B 
[ feoa=s[rm+ r0] - Z[ 0-0] “sf f(x)Ba(a)dx. (12.52) 
0 


Continuing, we replace Bz(x) by B4(x)/3 and once again integrate by parts. Because 
Bon41C1) = Ban41(0) =0, n=1,2,3,..., (12.53) 


the integration by parts produces no integrated terms, and 
if lf if 
5 | Fc Bawax = 33 f tPcrBsendx=— 5 f ¢ooB,dx. (12.54) 
0 0 “0 


Substituting B3(x) = Bi (x)/4 and carrying out one more partial integration, we get inte- 
grated terms containing B4(x), which simplify according to Eq. (12.51). The result is 


1 1 


rl B 1 
= 5 f £2 eBas =F fw - fo] + i/ FO (x) Baw)dx. (12.55) 


0 0 


We may continue this process, with steps that are entirely analogous to those that led 
to Eqs. (12.54) and (12.55). After steps leading to derivatives of f of order 2q — 1, 
we have 





1 
1 2. _ = 
[ feoar= 51 f+ 10] = api Bel Se? Day — f2?-D@o] 
‘ = 





+ opi an f° (x) Bog dr. (12.56) 


This is the Euler-Maclaurin integration formula. It assumes that the function f(x) has the 
required derivatives. 
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The range of integration in Eq. (12.56) may be shifted from [0, 1] to [1, 2] by replacing 
Ff (x) by f(x + 1). Adding such results up to [n — 1, n], we obtain 


n 1 
[ foods = 510+ FO) 4 14-4 Fn= 4 FO) 
0 


-» 


p=! 





1 (2p—1) — ¢(2p-l) 
Spy Bolt PDeny — fP-Y)] 


1 n—1 


1 
+ Gal an | Baal) ) f PD (x + v)dx. (i257) 


Note that the derivative terms at the intermediate integer arguments all cancel. However, 
the intermediate terms f(j) do not, and 5 fO)+ fd)+---+ 5 jf (n) appear exactly as 
in trapezoidal integration, or quadrature, so the summation over p may be interpreted as a 
correction to the trapezoidal approximation. Equation (12.57) may therefore be seen as a 
generalization of Eq. (1.10). 

In many applications of Eq. (12.57) the final integral containing f°”, though small, 
will not approach zero as g is increased without limit, and the Euler-Maclaurin formula 
then has an asymptotic, rather than convergent character. Such series, and the implications 
regarding their use, are the topic of a later section of this chapter. 

One of the most important uses of the Euler-Maclaurin formula is in summing series by 
converting them to integrals plus correction terms.” Here is an illustration of the process. 


Example 12.3.1 EstimaTION oF €(3) 


A straightforward application of Eq. (12.57) to ¢(3) proceeds as follows (noting that all 
derivatives of f(x) = 1/x? vanish in the limit x > 00): 


lee) q 
c(3)= ae = =5f0) “{% s = eo D1) + remainder. (12.58) 


pari 


Evaluating the integral, setting f (1) = 1, and inserting 





7 (2n + 1)! 
FOP  @==- Dy nt2 
with x = 1, Eq. (12.58) becomes 
1 1 Q@p+B 
c3)= 5 + 5 + > ( ane 4 + remainder. (12.59) 
p= 


2See R. P. Boas and C. Stutz, Estimating sums with integrals. Am. J. Phys. 39: 745 (1971), for a number of examples. 
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Table 12.4 Contributions to ¢(3) of Terms in 
Euler-Maclaurin Formula 











no =1 ng =2 ng =4 
Explicit terms 0.500000 1.062500 1.169849 
epee 0.500000 0.125000 «0.031250 
By term 0.250000 0.015615 0.000977 
Ba term —0.083333 —0.001302 —0.000020 
Be term 0.083333 0.000326 0.000001 
Bg term —0.150000 —0.000146 —0.000000 
Bio term 0.416667 0.000102 0.000000 
By term —1.645238  —0.000100  —0.000000 
Byi4 term 8.750000 0.000134 0.000000 
Sum“ 1.166667 1.201995 1.202057 





@ Sums only include data above horizontal marker. 

Left column: formula applied to entire summation; central column: 
formula applied starting from second term; right column: formula 
starting from fourth term. 


To assess the quality of this result, we list, in the first data column of Table 12.4, the con- 
tributions to it. The line marked “explicit terms” consists presently of only the term 5 fd). 
We note that the individual terms start to increase after the B4 term; since it is our inten- 
tion not to evaluate the remainder, the accuracy of the expansion is limited. As discussed 
more extensively in the section on asymptotic expansions, the best result available from 
these data is obtained by truncating the expansion before the terms start to increase; adding 
the contributions above the marker line in the table, we get the value listed as “Sum.” For 
reference, the accurate value of ¢(3) is 1.202057. 

We can improve the result available from the Euler-Maclaurin formula by explicitly 
calculating some initial terms and applying the formula only to those that remain. This 
stratagem causes the derivatives entering the formula to be smaller and diminishes the 
correction from the trapezoid-rule estimate. Simply starting the formula at n = 2 instead 
of n = | reduces the error markedly; see the second data column of Table 12.4. Now the 
“explicit terms” consist of f(1) + : f (2). Starting the Euler-Maclaurin formula at n = 4 
further improves the result, then reaching better than seven-figure accuracy. | 


When the Euler-Maclaurin formula is applied to sums whose summands have a finite 
number of nonzero derivatives, it can evaluate them exactly. See Exercise 12.3.1. 


Exercises 


12.3.1 


The Euler-Maclaurin integration formula may be used for the evaluation of finite series: 





n n 1 1 Boy ., ; 
Y fom = f fondx+ 5 fa)+ SfM+5 [fom ray Teasers 
1 


m=1 





12.3.2 


12.4 


12.4 Dirichlet Series 571 


Show that 


(a) Si m=jnn+1). 


m=1 


(b) > m? = én(n + I)2n+ 1). 


m=1 


(c) ye m= y(n a Iie, 


m=1 


(d) So mt = yyn(n t+ Qn + 1I)Gn? + 3n— 1). 


m=1 


The Euler-Maclaurin integration formula provides a way of calculating the Euler- 
Mascheroni constant y to high accuracy. Using f(x) = 1/x in Eq. (12.57) (with interval 
[1, m]) and the definition of y, Eq. (1.13), we obtain 





Ss Inn : +s Box 
= 2n ta (2k)n2k ° 


Using double-precision arithmetic, calculate y for N = 1,2,.... 


Note. See D. E. Knuth, Euler’s constant to 1271 places. Math. Comput. 16: 275 (1962). 


ANS. Forn= 1000, N =2 
y = 0.5772 1566 4901. 


DIRICHLET SERIES 


Series expansions of the general form 


s(s)= 02 
- 
n 


are known as Dirichlet series, and our knowledge of contour integration methods and 
Bernoulli numbers enables us to evaluate a variety of expressions of this type. One of the 
most important Dirichlet series is that of the Riemann zeta function, 


aaa 
c(is)=yo—. (12.60) 
n=1 


n 


We have already evaluated a sum from which ¢(2) can be extracted. 
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Example 12.4.1 Evawuation oF (2) 


From Example 11.9.1, we have 





Co 
1 xz cothza 1 
s@=), n2 + a2 = 2a 2a2° 
Simply by taking the limit a > 0, we have 
14 1 Ta 1 x? 
¢(2) = lim S(a) = lim (apts Pt) =a as (12.61) 


From the relation with the Bernoulli numbers, or alternatively (and perhaps less conve- 
niently) by contour-integration methods, we find 


(H=— 


Values of ¢(2n) through ¢(10) are listed in Exercise 12.4.1. The zeta functions of odd 
integer argument seem unamenable to evaluation in closed form, but are easy to compute 
numerically (see Example 12.3.1). 

Other useful Dirichlet series, in the notation of AMS-55 (see Additional Readings), 


include 
n(s) = Pep = (1 -2'*)¢(s), (12.62) 
n=| 
A(s) = On -1)%=0-2*)e(s), (12.63) 
n=0 
Bis) = Te "en +1)": (12.64) 
n=0 


Closed expressions are available (for integer n > 1) for €(2n), n(2n), and A(2n), and for 
B(2n — 1). The sums with exponents of opposite parity cannot be reduced to ¢(2n) or per- 
formed by the contour-integral methods we discussed in Chapter 11. An important series 
that can only be evaluated numerically is that whose result is Catalan’s constant, which is 


1 1 
BQ)=1- qe gg a (12.65) 
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For reference, we list a few of these summable Dirichlet series: 














oe hae a 12.66 
GQ) Lit iigp eae (12.66) 
WB fla cg see 12.67 
BOS sab age oat (12.67) 
1 1 x? 
n2)=1- += gtecsip (12.68) 
1 1 Tn4 
n4)=1-S+ a pees are (12.69) 
1 1 x? 
AQ) Asta ap ht Se (12.70) 
1 1 x4 
RA) Seah BO es (12.71) 
Piette See 12.72 
Sis 3s => (12.72) 
1 1 3 
BG) =1 ata =a: (12.73) 
Exercises 
2(2n)! 
12.4.1 From By, = (—1)"7! oa ¢(2n), show that 
(27 )2” 
ae a" 
2=—, (d) cB 
(a) ¢(2)= P (d) ¢(8)= 3450’ 
x4 10 
b 4 Pree) 10 = - 
(b) ¢(4)= 9 (e) ¢(10) 93,555 
6 
IT 
(c) €(6)= 945” 


12.4.2 The integral 
1 
/ [In(1 -2pS 
0 


appears in the fourth-order correction to the magnetic moment of the electron. Show 
that it equals 2¢ (3). 


Hint. Let l1—x =e". 
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12.4.3 


12.4.4 


12.4.5 


12.4.6 


12.4.7 


12.5 


(a) Show that 





Cc 
(In z)? 1 1 1 
dea Al = 4 a fats |, 
fae a a 
0 
(b) By contour integration show that this series evaluates to 23/8. 


Show that Catalan’s constant, §(2), may be written as 


oo 2 
= _ 4-2 _ 
B(2)=2) (4k 3) — 


k=1 
Hint. x” = 6£(2). 
Show that 


‘ind 1 4 In(1 — 
(a) i, PER ec oy. - thy tin MOS) ge ep). 
0 x 2 a>1 Jo x 
Note that the integrand in part (b) diverges for a = | but that the integral is convergent. 
(a) Show that the equation In2 = °° ,(—1)°*!s~!, Eq. (1.53), may be rewritten as 
n le) 1 -1 
In2=5)°2°¢6s)+ Y-Qpy"111-—| . 
n2=)°2-*¢(s)+ > (2p) =| 
s=2 p=1 
Hint. Take the terms in pairs. 
(b) Calculate In2 to six significant figures. 


(a) Show that the equation 7/4 = °°, (—1)°t!(2s — 1)~!, Eq. (12.72), may be 
rewritten as 


us 


n lee) 1 = 
F=1-2904 e028) 2 apy] 1 - | ; 
s=l1 


p=1 


4 


(b) Calculate 7/4 to six significant figures. 


INFINITE PRODUCTS 


We saw in Chapter | 1 that complex variable theory can be used to generate infinite-product 
representations of analytic functions. Here we develop some of their properties. For that 
purpose it is convenient to write these products in the form 


P= [[a ase), 


n=1 
The infinite product may be related to an infinite series by the obvious method of taking 
the logarithm: 


In] [G +4n) =} 71nd + ay). (12.74) 


n=1 n=1 
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The main theorem regarding convergence of infinite products is the following: 
If 0 < an <1, the infinite products eit + ay) and They. — dn) converge if 
yo) an converges and diverge if )-°° | an diverges. 

For the infinite product [[(1 + ay), note that 

1+ a, <e, 
which means that the partial product consisting of the first n factors satisfies 
Pn Se", 


where s,, is the sum of the first n a,. Letting n > ov, 


] [Gd +a) < exp YO an, (12.75) 


thereby giving an upper bound for the infinite product. 
To develop a lower bound, we note that, because all a; > 0, 


mats dat Daa + +> Sp. 


i=1 j=1 


Hence 
CO Co 
[ [a +a) = So an. (12.76) 
n=1 n=1 


If the infinite sum remains finite, the infinite product will also. But if the infinite sum 
diverges, so will the infinite product. 
The case [[(1 — a,) is complicated by the negative signs, but a proof similar to the 


foregoing may be developed by noting that for a, < 5 


(1—ay)<(it+a,)! and (1—a,)>(1+2a,)7!. 


Example 12.5.1 = CONVERGENCE OF INFINITE PRODUCTS FOR Sin z AND cos z 


These products, developed in Eqs. (11.89) and (11.90), are 


sinzZ = => TI Eee ia). cos Z = I] (: — ia) (12.77) 


n=1 
The product expansion of sin z converges for all z, because, writing the factors as (1 — ay), 


2 


ae 2 Zz Zz 
=;— ~* = —_ 2 eel 
got 52) = 


“13 
a 


n=1 
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a convergent result. For the expansion of cos z, we have 


CO oe) 

42? =) 42? = 
dia = a dn -lv= ya) _ 3” 
n= A= 


also convergent for all z. Note, however, that if z is large, many terms of the product will 
have to be taken before either of these series approaches convergence. In fact, the main 
use of these series is in establishing mathematical results rather than for precise numerical 
work in physics. a 


We close this section with one further example illustrating a technique for working with 
infinite products. 


Example 12.5.2 AN INTERESTING PRODUCT 


We wish to evaluate the infinite product 


p=TI (1-3). 


We note that the product we seek is equivalent to all but the first term of the product 
expansion of sinz with z= z as given in Eq. (12.77). In fact, the missing first term, which 
is zero, guarantees that we will get the correct result for sinz. For general z, we move 
the first term (and the prefactor z) to the left-hand side of the product formula for sin z, 


reaching 
. [o,@) 2 
sin Z Zz 
—_- = J] ——— }. 
zd — 22/7) I] ( =) 


n=2 


We now take the limits of the two sides of this equation as z — z, applying I’ H6pital’s 
tule to evaluate the left-hand side and recognizing the right-hand side as P. Thus, 





: sin z COS Z 
P= lim 


zon 2(1—22/n2)  1—3z2/n2|__ = 


f= 





Exercises 


12.5.1 Using 


In] [@ tan) = D> Ind £an) 


n=l n=1 











and the Maclaurin expansion of In(1 +a,), show that the infinite product aia ;d an) 
converges or diverges with the infinite series }°7° | an. 





12.5.2 


12.5.3 


12.5.4 


12.5.5 


12.5.6 


12.5.7 
12.5.8 


12.5.9 


12.5.10 
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An infinite product appears in the form 


as l+a/n 
1 (=a). 


n=1 





where a and b are constants. Show that this infinite product converges only if a = b. 


Show that the infinite product representations of sinx and cosx are consistent with the 
identity 2 sinx cos x = sin 2x. 


Determine the limit to which []7°, (1 + cp converges. 
Show that [[>—, [1 - aan =. 


1\)_1 
Prove that []°°, (1 - +) = 5. 
Verify the Euler identity aha +2z?)= Tj ae a ae ee Fl 
Show that [[72, (1 + x/ r)e—*/" converges for all finite x (except for the zeros of 1 + 
x/r). 
Hint. Write the nth factor as 1 + dy. 
Derive the formula, valid for small x, 
Insinx = Inx + Yar, 
giving the explicit form for the coefficients ay. 
Hint. d(nsinx)/dx = cotx. 
Using the infinite product representations of sin z, show that 
1 > i Zz \2m 
zcotz = 1 — — | 4 
it 2 = ) 
m,n=l 
and hence that the Bernoulli numbers are given by the formula 
2(2n)! 
(27r)2" 





Bo, = (-1)""! c(2n). 


This is an alternate route to Eq. (12.38). 
Hint. The result of Exercise 12.5.9 will be helpful. 


12.6 ASYMPTOTIC SERIES 


Asymptotic series frequently occur in physics. In fact, one of the earliest and still impor- 


tant approximations of quantum mechanics, the WKB expansion (the initials stand for its 


originators, Wenzel, Kramers, and Brillouin), is an asymptotic series. In numerical com- 


putations, these series are employed for the accurate computation of a variety of functions. 
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We consider here two types of integrals that lead to asymptotic series: first, integrals of 
the form 


CO 


h@)= / e fludu, 
x 
where the variable x appears as the lower limit of an integral. Second, we consider the 
form 


tex) = f eg (“Ja 
0 


with the function f to be expanded as a Taylor series (binomial series). Asymptotic series 
often occur as solutions of differential equations; we encounter many examples in later 
chapters of this book. 


Exponential Integral 


The nature of an asymptotic series is perhaps best illustrated by a specific example. Sup- 
pose that we have the exponential integral function* 





x ol 
Ei(x) = i —du, (12.78) 
u 
—Co 
which we find more convenient to write in the form 
[o.@) 
e 
Ei(—x) =f du= E(x), (12.79) 
u 


x 
to be evaluated for large values of x. This function has a series expansion that converges 
for all x, namely 


CO 


E\(x)=-y —Inx— 


n=1 


(-1)"x" 


nn! 





; (12.80) 


which we derive in Chapter 13, but the series is totally useless for numerical evalua- 
tion when x is large. We need another approach, for which it is convenient to generalize 
Eq. (12.79) to 


CO 


1. p)= fo du, (12.81) 


x 





where we restrict consideration to cases in which x and p are positive. As already stated, 
we seek an evaluation for large values of x. 


3 This function occurs frequently in astrophysical problems involving gas with a Maxwell-Boltzmann energy distribution. 
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Integrating by parts, we obtain 


fore) (oe) 
I e* eu d e* pe * 1 eu d 
(x, p)= ae Dp pel = 5 ypHt + P(p + ) pte Uu 
x 


x 





Continuing to integrate by parts, we develop the series 








a Pp  p(p+1) 1 (p+n—2)! 
I(x, p)=e (S pel + Pte +++ (-1)" (p — DixPte=l 
( D! fe 
pt+n-—l1)! e- 
acl (p— 1)! | wt ee 
7 


This is a remarkable series. Checking the convergence by the d’Alembert ratio test, we 
find 
! 1 
fi Maetl PPO 1, PTO (12.83) 


n> 0o |un| noo (p+n-—1)! x n>o x 
for all finite values of x. Therefore our series as an infinite series diverges everywhere! 
Before discarding Eq. (12.83) as worthless, let us see how well a given partial sum approxi- 


mates our function J (x, p). Taking s, as the partial sum of the series through n terms and 
R,, as the corresponding remainder, 











[o.@) 
(p +n)! et 
1, P)— ia, PRC =o J pratt = Rn(®, p). 
In absolute value 
( )! - u 
pt+n)! e- 
|Ri(x, P\< —1)! yptnt+l 
x 
When we substitute u = v + x, the integral becomes 
Co 
e 4 
/ oven -—| (v parr 
x 
CO 
=p=n=1 
= Sa aa fev (i dv. 
0 


For large x the final integral approaches | and 


(p+n)! e* 


|Rn(x, p)| © (p— bi xpheT 


(12.84) 
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This means that if we take x large enough, our partial sum s,, will be an arbitrarily good 

approximation to the function /(x, p). Our divergent series, Eq. (12.82), therefore is per- 

fectly good for computations of partial sums. For this reason it is sometimes called a semi- 

convergent series. Note that the power of x in the denominator of the remainder, namely 

p+n-+1, is higher than the power of x in the last term included in s,(x, p), namely p+n. 
Thus, our asymptotic series for FE, (x) assumes the form 


CO 


—u 
cease f* du 
u 


x 





1 1! 2! 3! a 
© Sn (x) = ae een 





Nn. 
oat (12.85) 
where we must choose to terminate the series after some n. 

Since the remainder R,(x, p) alternates in sign, the successive partial sums give alter- 
nately upper and lower bounds for J (x, p). The behavior of the series (with p = 1) as a 
function of the number of terms included is shown in Fig. 12.2, where we have plotted 
partial sums of e* E;(x) for the value x = 5. The optimum determination of e* E(x) is 
given by the closest approach of the upper and lower bounds, that is, for x = 5, between 
56 = 0.1664 and s5 = 0.1741. Therefore 


0.1664 < e* Ey(x)| ; < 0.1741. (12.86) 
x= 
Actually, from tables, 


e* E\(x)| = 0.1704, (12.87) 
x=. 

















FiGurRe 12.2 Partial sums of e* Ej (x) |,=5. 
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within the limits established by our asymptotic expansion. Note that inclusion of addi- 
tional terms in the series expansion beyond the optimum point reduces the accuracy of 
the representation. As x is increased, the spread between the lowest upper bound and the 
highest lower bound will diminish. By taking x large enough, one may compute e* E(x) 
to any desired degree of accuracy. Other properties of E;(x) are derived and discussed in 
Section 13.6. 


Cosine and Sine Integrals 


Asymptotic series may also be developed from definite integrals, provided that the inte- 
grand has the required behavior. As an example, the cosine and sine integrals (in Table 1.2) 
are defined by 


. Fs 
Ci(u) = — — at, (12.88) 
u 
CO 
. sint 
si(u) = — = dt. (12.89) 


u 


Combining these, using the formula for elf 


[o@) 


; : e 
Ci(u) + isi(u) = -| 


u 


—it 


dt, 





and then changing the integration variable from ¢ to z, we reach 
CO 


. izq 
Fu) = Ci(u) + isi(u) =-e" f * a 
Utz 
0 





(12.90) 
To further process F (uw), we now consider the contour integral 


iz 

: e'*dz 

-e"g ; 
Uu+zZ 


Cc 





where the contour C is that shown in Fig. 12.3. Since we are interested in evaluation for 
large positive (and real) u, our integrand has as its only singularity a pole on the negative 
real axis, so the region enclosed by the contour is entirely analytic and the contour integral 
therefore vanishes. The exponential and the denominator cause the arc at infinity (labeled 
B) not to contribute to the contour integral, so the integral we seek is obtained from seg- 
ment A and must be equal to the negative of the integral on segment D. Therefore, we 
have 


e Yidy 
utiy’ 





F(u) = —e'" (12.91) 
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FiGURE 12.3. Contour for sine and cosine integrals. 


which is already helpful since we have converted an oscillatory integral into one with a 
monotonically and exponentially decreasing integrand. To obtain an asymptotic expansion, 
we continue by expanding the denominator of the integrand using the binomial theorem, 


writing 
1! [ ey | 
u+tty u u Uu 


We plan to integrate in y from zero to infinity, and the proposed expansion will be diver- 
gent when y > u, but we proceed anyway, because the terms of the series will initially 
be decreasing and will be satisfactory as an asymptotic expansion. Formally, we take the 
viewpoint that we are writing 1/(u + iy) as a finite series plus a remainder, and we will 
abandon the expansion at or before the point that the remainder is a minimum. 

Inserting the expansion, and integrating termwise using the formula 


Co 
[oreray =n!, 
0 





we get 





_ ie (1! 2\ (3! 4! 
F(u) —— 1 i(=) (3) +(S)+(S)--]. (12.92) 


As for our earlier example, the exponential integral, this series will diverge for all u, but if 
u is sufficiently large the terms will initially decrease to very small values before increasing 
again toward divergence. 

To go from the expansion of F(u) to those of Ci and si, we need to separate it into real 
and imaginary parts. Writing e’” = cosu + isinu and collecting terms appropriately, we 





12.6 Asymptotic Series 583 


get as the desired asymptotic expansions 











. N N 
..  SINuU (2n)! cosu (2n + 1)! 
Cia) * — yo(-" a 7 yo(-" aaa (12.93) 
n=0 n=0 
4 COSU Qn)! sinu & Ont! 
si(u) © a 1) ar : a 1) oa (12.94) 
n=0 n=0 


Definition of Asymptotic Series 


Poincaré has introduced a formal definition for an asymptotic series.* Following Poincaré, 
we consider a function f(x) whose asymptotic expansion is sought, the partial sums s, in 
its expansion, and the corresponding remainders R,, (x). Though the expansion need not be 
a power series, we assume that form for simplicity in the present discussion. Thus, 


x" R(x) =x"[f X) = Sn(x)], (12.95) 
where 
a a a, 
Sn(x) = ag + — +S tet. (12.96) 
Xx Xx Xx 
The asymptotic expansion of f(x) is defined to have the properties that 
lim x” R,(x) =0, for fixed n, (12.97) 
X—>0O 
and 
lim x” R,(x) = 00, for fixed x. (12.98) 
n—-> oo 


These conditions were met for our examples, Eqs. (12.85), (12.93), and (12.94).° 
For power series, as assumed in the form of s,(x), R,(x) © x~"—!. With the conditions 
of Eqs. (12.97) and (12.98) satisfied, we write 


fOr ae, (12.99) 
n=0 


Note the use of ~ in place of =. The function f(x) is equal to the series only in the limit 
as x — oo and with the restriction to a finite number of terms in the series. 

Asymptotic expansions of two functions may be multiplied together, and the result will 
be an asymptotic expansion of the product of the two functions. The asymptotic expansion 
of a given function f(t) may be integrated term by term (just as in a uniformly convergent 
series of continuous functions) from x < ft < oo, and the result will be an asymptotic 


4Poincaré’s definition allows (or neglects) exponentially decreasing functions. The refinement of his definition is of considerable 
importance for the advanced theory of asymptotic expansions, particularly for extensions into the complex plane. However, for 
purposes of an introductory treatment and especially for numerical computation of expansions for which the variable is real and 
positive, Poincaré’s approach is perfectly satisfactory. 

5 Some writers feel that the requirement of Eq. (12.98), which excludes convergent series of inverse powers of x, is artificial and 
unnecessary. 
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expansion of J i f (@t)dt. Term-by-term differentiation, however, is valid only under very 
special conditions. 


Some functions do not possess an asymptotic expansion; e* is an example of such a 


function. However, if a function has an asymptotic expansion of the power-series form in 
Eq. (12.99), it has only one. The correspondence is not one to one; many functions may 
have the same asymptotic expansion. 


One of the most useful and powerful methods of generating asymptotic expansions, the 


method of steepest descents, is developed in the next section of this text. 


Exercises 


12.6.1 


12.6.2 


12.6.3 


12.6.4 


Integrating by parts, develop asymptotic expansions of the Fresnel integrals 
x 2: x 2 
(a) co)= f cos mY du, six) = | sin call 
0 2 0 2 


These integrals appear in the analysis of a knife-edge diffraction pattern. 


Rederive the asymptotic expansions of Ci(x) and si(x) by repeated integration by parts. 

oo elt 

Hint. Ci(x) + isi(x) = -| 7. 
x 


Derive the asymptotic expansion of the Gauss error function 


xX 
2 
erf(x) = a / oF a 
0 





2 
e* 1 1-3. 1-3-5 (2n — 1)! 
x1 5 eth \e ee 
SX ( 2x2 = 22x4 23 x6 ia ce Qn x 2n ) 
Hint. erf(x) = 1 — erfe(x) = 1 — = foe 


Normalized so that erf(oo) = 1, this function plays an important role in probability 
theory. It may be expressed in terms of the Fresnel integrals (Exercise 12.6.1), the in- 
complete gamma functions (Section 13.6), or the confluent hypergeometric functions 
(Section 18.5). 


The asymptotic expressions for the various Bessel functions, Section 14.6, contain the 
series 


Tti[4v? — 2s — 197] 
(2n)!(8z)?” 

ot 4? = Os —1)7] 

(2n — 1)!(8z)2"—! 


Py(z)~1+ > 0(-1)" 


n=1 


Ov) ~ > pel 


n=1 





’ 





Show that these two series are indeed asymptotic series. 





12.6.5 


12.6.6 


12.6.7 


12.7 
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For x > 1, 





1 ~ el 
1+x = a ) xntl? 
n=0 
Test this series to see if it is an asymptotic series. 


Derive the following Bernoulli-number asymptotic series for the Euler-Mascheroni con- 
stant, defined in Eq. (1.13): 


Se Inn Lass Bo 
eek an + (2k)? 





Here n plays the role of x. 


1 


Hint. Apply the Euler-Maclaurin integration formula to f(x) = x~° over the interval 


[1,n] for N =1,2,.... 


Develop an asymptotic series for 


[o,@) 
e* 
—— dv. 
/ (1+ v?)? 
0 
Take x to be real and positive. 


1 2! 4! (—1)"(2n)! 
ANS. ge ge 





METHOD OF STEEPEST DESCENTS 


In this section we consider the frequently occurring situation that we require the asymptotic 
behavior (for large t, assumed real) of a function f(t), where 


e f(t) is represented by an integral of the generic form 


f@OH= / F(z, t)dz, 
Cc 
with F(z, t) analytic in z, but also parametrically dependent on ¢; 
e The integration path C is, or can be deformed to be, such that for large t the dominant 


contribution to the integral arises from a small range of z in the neighborhood of the 
point zo where | F (zo, f)| is a maximum on the path; 


e The integration path will pass through Zo in the orientation that causes the most rapid 
decrease in | F'| on departure from Zo in either direction along the path (hence the name 
steepest descents); and 


e In the limit of large ¢ the contribution to the integral from the neighborhood of zo 
asymptotically approaches the exact value of f(t). 
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While the above conditions seem rather restrictive, they can in fact be met for many of the 
important special functions of mathematical physics, including, among others, the gamma 
function and various Bessel functions. 


Saddle Points 


The integration path supplied with the original definition of an integral representation 
defining a function f(t) will not usually meet the conditions outlined above, and we need 
to consider the features of the integrand F(z, t) that will be useful in defining a more suit- 
able path which, even if the original formulation is entirely real, may be a more general 
contour in the complex plane. We already know (Exercise 11.2.2) that neither the real nor 
the imaginary part of an analytic function can have an extremum (either a minimum or 
maximum) within the region of analyticity, and the same is also true of its modulus (this 
result is Jensen’s theorem; see Exercise 12.7.1). To better understand that, let us represent 
F(z, t) (in a region where it is assumed nonzero) in the form 


FE pH" ae Ornen, (12.100) 


where u and v are the real and imaginary parts of an analytic function w; this representation 
permits us to identify u as In| F'|; the fact that wu cannot have an extremum makes Jensen’s 
theorem obvious. 

Although u cannot have an extremum, it can have a saddle point (a point at which 
w’ = 0; then also du/ds = 0 for all directions ds, but with higher derivatives that are 
positive in some directions and negative in others (see Fig. 12.4). Let us examine some 
general features of w and its components u and v in the neighborhood of a saddle point of 
u, which we designate zo. We proceed by expanding w(z, t) in a Taylor series about zo. 
Because w’ = 0 there, the first two nonzero terms of the expansion are 


w(z, t) = w(zo,t) + oD eg — 20) oes, (12.101) 





FIGURE 12.4 Saddle point of u (= | F'|); see Eq. (12.100). 
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It could be that w” (zo, t) = 0, but that possibility makes the analysis more complicated 
without changing it in a fundamental way, so we proceed under the assumption that 
w’ (zo, t) # 0. Using the abbreviated notations wo = w(zo, t), w” (Zo, t) = wg, and in- 
troducing the polar forms wy = |wole!, z—z =re!®, Eq. (12.101) becomes 


1 . 
w(z,t) = wot 5 lwp leer? fee (12.102) 
1 
ares slwglr] cos(a +26) +i sin(a + 26) | Shea, (12.103) 


For later reference we note that a is the argument of w” (zo, t). We see that, in general 
at a saddle point, u (the real part of w) will increase most rapidly when aw + 26 = 2nz, 
corresponding to the opposite directions 6 = —a@/2 and 6 = —a/2 +7. On the other hand, 
u will decrease most rapidly when a + 20 = (2n + 1)z, 1.¢., 0 = —a/2+ (5 or 370), the 
two directions perpendicular to those of maximum increase. And u will (to second order) 
remain constant (so-called level lines) in the directions 6 = —a/2 + (4, an, Sn, im). 
See the left panel of Fig. 12.5. 

The behavior of v (the imaginary part of w) will be similar to that of u, but displaced in 
angle by 45°. The level lines of v will be in the directions 6 = —a; + (0, 1/2, m, 3/2), 
and therefore will coincide with the directions of maximum increase or decrease in u. See 
the right panel of Fig. 12.5. 

We are now ready to identify an optimum contour for evaluating the integral repre- 
sentation of f(t), namely one that passes through the saddle point zo in the directions of 
maximum rate of decrease in u with distance from zg, and therefore also in |F|. These 
directions have the additional advantage that they are level lines of v, so that the factor e!” 
will not produce changes of phase (oscillatory behavior and therefore numerical instabil- 
ity) in F as we leave the saddle point. If we had chosen zg to be a point other than a saddle 
point, the expansion of w would have contained a nonzero linear term in r, and it would 
not have been possible to construct a curve through zo that would cause | F'| to decrease in 
both path directions, or to keep the phase of F constant. 











FiGURE 12.5 Near a saddle point in w = u + iv: When features of u are oriented as in 
the left panel, those of v are as shown in the right panel. Arrows indicate ascending 
directions. 
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Saddle Point Method 


Now that we have identified zo and the directions of steepest descent in | F(z, t)|, we com- 
plete the specification of the method of steepest descents, also called the saddle point 
method of asymptotic approximation, by assuming that the significant contributions to the 
integral are from a small range of 0 < r < a in each of the two directions along the path. 
Before obtaining a final result, we must make one more observation. Looking at the way 
in which the contour had to be deformed to pass through zo, we need to determine the 
sense of the path (i.e., we must decide whether the direction of travel is at @ = —a/2+ 500 
or at 0 = —a/2+ 37). Assuming that this has been decided, we can then identify, for 
the portion of the path in which we descend from F (zo), dz = e!?dr. The contribution in 
which we ascend to F'(zg) will have the opposite sign for dz but we can handle it simply 
by multiplying the descending contribution by two. Then, noting that e+? = —1, our 
approximation to f(t) is 


a 
sen remeie f e-mil Par, (12.104) 
0 


where the initial “2” causes inclusion of the ascent to z9. We now make the key assumption 
of the method, namely that |w(|, the measure of the rate of decrease in | F'| as we leave Zo, 
is large enough that the bulk of the value of the integral has already been attained for small 
a, and that the exponential decrease in the value of the integrand enables us to replace a 
by infinity without making significant error. In problems where the saddle point method 
is applicable, this condition is met when ¢ is sufficiently large. We complete the present 
analysis by remembering that e° = F(z, t) and by evaluating the integral for a = ~, 


where, cf. Eq. (1.148), it has the value ,/7/2|wi|. We get 


fO)® F(eo, te | 7 (12.105) 
|w’ (zo, t)| 


arg(w” (zo, t)) 4 37 
or , 
2 2 2 


We remind the reader that 


6= 





(12.106) 


with the choice (which affects only the sign of the final result) determined from the sense 
in which the contour passes through the saddle point zo. 

Sometimes it is sufficient to apply the method of steepest descents only to the rapidly 
varying part of an integral. This corresponds to assuming that we may make the approxi- 
mation 


f= | @enre, ndz~ gtco.t) f Fle.nde, (12.107) 
Cc 


Cc 
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after which we proceed as before. Note that this causes g not to be considered when we 
define w or w”, and our final formula is replaced by 


FO) * g(c0.t)F(co, te | — = (12.108) 
|w’’(zo, t)| 


A final note of warning: We assumed that the only significant contribution to the inte- 
gral came from the immediate vicinity of the saddle point z = zo. This condition must be 
checked for each new problem. 


Example 12. 7. 1 ASYMPTOTIC FORM OF THE GAMMA FUNCTION 


In many physical problems, particularly in the field of statistical mechanics, it is desir- 
able to have an accurate approximation of the gamma or factorial function of very large 
numbers. As listed in Table 1.2, the factorial function may be defined by the Euler integral 


lo) Co 
nare+= f ple Pdp=r't f etm Pdz, (12.109) 
0 0 


Here we have made the substitution p = zt in order to convert the integral to the form given 
in Eq. (12.108). As before, we assume that f is real and positive, from which it follows that 
the integrand vanishes at the limits 0 and oo. By differentiating the exponent, which we 
call w(z, t), we obtain 


dw d t t 
=f ] = - YH 
ae qzine Zz) : w 7 


which shows that the point z = 1 is a saddle point and arg w”’ (1, t) = arg(—t) = 2. Apply- 
ing Eq. (12.106), we see that the direction of travel through the saddle point is 


MN 3 
6= =~ +(3 or =) =o or 7; 








the choice 6 = 0 is that consistent with deformation from a path that was originally along 
the real axis. In fact, what we have found is that the direction of steepest descent is along the 
real axis, a conclusion that we might have reached more or less intuitively. 


Direct substitution into Eq. (12.108) with g = r’+!, F =e, 6 =0, and |w”| = -t 
yields 
20 44-4 t+1/2,—-1 
Ht=Pt+l)~ a e'=V2nt e. (12.110) 


This result is the leading term in Stirling’s expansion of the gamma function. The method 
of steepest descents is probably the easiest way of obtaining this term. Further terms in the 
asymptotic expansion are developed in Section 13.4. 

In this example the calculation was carried out assuming ¢ to be real. This assumption 
is not necessary. We may show (Exercise 12.7.3) that Eq. (12.110) also holds when ¢ is 
complex, provided only that its real part be required to be large and positive. a 
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Sometimes the application of the saddle point method to a real integral results in a con- 
tour that goes through a saddle point that is not on the real axis. Here is a relatively simple 
example. A more complicated case of practical importance appears in the chapter on Bessel 
functions (see Section 14.6). 


Example 12.7.2 SADDLE POINT METHOD AVOIDS OSCILLATIONS 


As a second example of the method of steepest descents, consider the integral 


P e7#@-1/9 ogg tZ 


—Co 


which we wish to evaluate for large positive t. When t is large, the integrand oscillates very 
rapidly, and ordinary quadrature methods become difficult. We proceed by bringing H(t) 
to a form appropriate for applying the saddle point method, replacing costz by costz + 
isintz =e! (a replacement that does not change the value of the integral because we 
added an odd term to the previously even integrand). We then have 


H(t) = i g(ae "© WN dz, (12.112) 
C 

with g(z) = 1/(1 + z’). This form corresponds to w(z) = —t(z* —iz— a) so we have 

w'(z) =—t(2z—i), which has a zero at zo =i/2. (12.113) 
Then, at zo, which is a saddle point, 
4 

wo=0, w"(zo)=—2t, (zo) = 7 (12.114) 
We also need the phase @ of the steepest-descent direction. Noting that arg(w”(zo)) = 7 


and applying Eq. (12.106), we find 6 = 0 (or z). 
We are now ready to apply Eq. (12.108). The result is 





VJ 2m (4/3)(e°) 4 
yee NE (12.115) 
| — 28] 3V ir 
As a check, we compare this approximate formula for H(t) with the result of a tedious 
numerical integration: For t = 100, Hexact = 0.23284, and Agaddie = 0.23633. |_| 


Exercises 


We present here a rather small number of exercises on the method of steepest descents. 
Several additional exercises appear elsewhere in this book, in particular in Section 14.6, 
where the technique is applied to the contour integral representations of Bessel func- 
tions. 


12.7.1 Prove Jensen’s theorem (that |F(z)|? can have no extremum in the interior of a region 
in which F is analytic) by showing that the mean value of | F|? on a circle about any 





12.7.2 


12.7.3 


12.8 
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point zo is equal to | F(zo)|*. Explain why you can then conclude that there cannot be 
an extremum of | F'| at zo. 


Find the steepest path and leading asymptotic expansion for the Fresnel integrals 
Ss 


Ss 
cos x*dx, / sinx?dx. 
0 0 


1 
ey) 
Hint. use | edz, 
0 


Show that the formula 
Td+s) ¥ V2rss*%e* 


holds for complex values of s (with Jte(s) large and positive). 


Hint. This involves assigning a phase to s and then demanding that 3m|[sf(z)] be con- 
stant in the vicinity of the saddle point. 


DISPERSION RELATIONS 


The concept of dispersion relations entered physics with the work of Kronig and Kramers 
in optics. The name dispersion comes from optical dispersion, a result of the dependence 
of the index of refraction on wavelength, or angular frequency. As we shall soon see, the 
index of refraction n may have a real part determined by the phase velocity and a (negative) 
imaginary part determined by the absorption. Kronig and Kramers showed in 1926-1927 
that the real part of (n* — 1) could be expressed as an integral of the imaginary part. 
Generalizing this, we shall apply the label dispersion relations to any pair of equations 
giving the real part of a function as an integral of its imaginary part and the imaginary 
part as an integral of its real part (we develop this in more detail below). The existence of 
such integral relations might be suspected as an integral analog of the Cauchy-Riemann 
differential equations, Eq. (11.9). 

The applications in modern physics are widespread. For instance, the real part of the 
function might describe the forward scattering of a gamma ray in a nuclear Coulomb field 
(a dispersive process). Then the imaginary part would describe the electron-positron pair 
production in that same Coulomb field (the absorptive process). As will be seen later, the 
dispersion relations may be taken as a consequence of causality and therefore are indepen- 
dent of the details of the particular interaction. 

We consider a complex function f(z) that is analytic in the upper half-plane and on 
the real axis. We also require that f(z) approach zero for large |z| in the upper half-plane 
sufficiently rapidly that its integral over the semicircular part of the contour in Fig. 12.6 
will be negligible. The point of these conditions is that we may express f(z) by the Cauchy 
integral formula, Eq. (11.30), using this contour, obtaining 


1 lo) 
f (20) = ni f@) 
Il x — Zo 





dx. (12.116) 


The integral over the contour shown in Fig. 12.6 has become an integral along the x-axis. 
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FiGURE 12.6 Contour for dispersion integral. 


Equation (12.116) assumes that zo is in the upper half-plane, interior to the closed con- 
tour. If zo were in the lower half-plane, the integral would yield zero by the Cauchy in- 
tegral theorem, Section 11.3. Now, if we move zg onto the real axis (then calling it xo) 
and pass it via a small clockwise semicircle s in the upper half-plane, the contour integral 
(which would contain no singularities) would have nonzero contributions corresponding to 
a Cauchy principal value integral minus half the usual contribution from the pole at xo, or 


f(x) desk f@) a 


x — x9 Z—x0o 
Ss 


= f TT eit 
Xx — x9 


equivalent to the final formula 


O= 





1 CO 
fej= 2 
wi} x —xXo 





dx. (12.117) 


Note that the cut integral sign denotes the Cauchy principal value. Splitting Eq. (12.117) 
into real and imaginary parts® yields 


f (xo) = u(xo) + iv(x0) 


1? : CO 
_ f v(x) dx -f u(x) dx. 
IT xX —X0 TT, x — Xo 





Finally, equating real part to real part and imaginary part to imaginary part, we obtain 


rene if om ae 


x — x09 
—Co 


(12.118) 


CO 


v(x) = --f Ae) ay 


TT XxX — XO 
—0o 





©The second argument, y = 0, is dropped: u(x, 0) > u(x). 
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These are the dispersion relations. The real part of our complex function is expressed as an 
integral over the imaginary part. The imaginary part is expressed as an integral over the real 
part. Alternatively, the real part can be called an integral transform of the imaginary part 
(and vice versa); the particular transform involved is known as a Hilbert transform, and 
we note that (apart from a minus sign) the Hilbert transform is its own inverse. Note that 
these relations are meaningful only when f (x) is a complex function of the real variable x. 
Compare Exercise 12.8.1. 

From a physical point of view u(x) and/or v(x) may represent some physical measure- 
ments. Then f(z) = u(z) + iv(z) is an analytic continuation over the upper half-plane, 
with the value on the real axis serving as a boundary condition. 


Symmetry Relations 
On occasion f(x) will satisfy a symmetry relation and the integral from —oo to +00 
may be replaced by an integral over positive values only. This is of considerable physical 


importance because the variable x might represent a frequency and only zero and positive 
frequencies are available for physical measurements. Suppose’ 


f(-x) = f*(). (12.119) 

Then 
u(—x) +iv(—x) = u(x) —iv(x). (12.120) 
The real part of f(x) is even and the imaginary part is odd.° In quantum mechanical scat- 


tering problems these relations, Eq. (12.120), are called crossing conditions. To exploit 
these crossing conditions, we rewrite the first of Eqs. (12.118) as 





0 ee) 
1 1 
cane f UU ie f Ue ae. (12.121) 
TU. X — XO Ty X —xXo 
—0o 0 


Letting x — —x in the first integral on the right-hand side of Eq. (12.121) and substituting 
v(—x) = —v(x) from Eq. (12.120), we obtain 


1 i 1 1 
u(xo) = v(x) + dx 
TC, X+X9 X-—XO 
0 


CO 


==} HONE) ee (12.122) 


ms x2 2 
0 








= XG 


7 This is not just a curiosity. It ensures that the Fourier transform of f(x) will be real. Or conversely, Eq. (12.119) is a conse- 
quence when f(x) is obtained as the Fourier transform of a real function. 

By (x, 0) = u(—x, 0), v(x, 0) = —v(—x, 0). Compare these symmetry conditions with those that follow from the Schwarz reflec- 
tion principle, Section 11.10. 
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Similarly, 





v(x) = -=f a i (12.123) 


Xx — Xo 


The original Kronig-Kramers optical dispersion relations were in this form. The asymptotic 
behavior (x9 — 00) of Eqs. (12.122) and (12.123) lead to quantum mechanical sum rules. 
See Exercise 12.8.4. 


Optical Dispersion 


The function exp[i(kx — wt)] can describe an electromagnetic wave moving along the x- 
axis in the positive direction with velocity v = w/k; @ is the angular frequency, k the wave 
number or propagation vector, and n = ck/w, the index of refraction. From Maxwell’s 
equations with electric permittivity « and magnetic permeability unity, and using Ohm’s 
law with conductivity o, the propagation vector k for a dielectric becomes” 


5) w Ano 
k =e= 1+i—— |}. (12.124) 
Cc WE 


The presence of the conductivity (which means absorption) causes k* to have an imagi- 
nary part. The propagation vector k (and therefore the index of refraction n) have become 
complex. 

For poor conductivity (421.0 /we < 1) a binomial expansion yields 


200 


ce 





bi — 
Cc 


and 


eitkx—ot) _ eiolrve/e—t) p—2n0x/e/e 


an attenuated wave. 
Returning to the general expression for k*, Eq. (12.124), we find that the index of re- 
fraction becomes 
2 7k? 4no 


n= — =eti—. (12.125) 
(00) (0) 





We take n? to be a function of the complex variable w (with ¢ and o depending on «). 
However, n? does not vanish as w —> oo but instead approaches unity. It therefore does 
not satisfy the condition needed for a dispersion relation, but this difficulty can be cir- 
cumvented by working with f(w) = n?(w) — 1. The Kronig-Kramers relations then take 


9 See J. D. Jackson, Classical Electrodynamics, 3rd ed. New York: Wiley (1999), Sections 7.7 and 7.10. Equation (12.124) is in 
Gaussian units. 
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the form 

2 f wIm{[n2 1 

elon) — 1] = = f OO ao, 
T (63) — ® 
0 

(12.126) 

2 fo Re[n?(w) — 1] 

Im[n2(wp) — 1] = fo a dw. 
IT (0) — @ 


Knowledge of the absorption coefficient at all frequencies specifies the real part of the 
index of refraction, and vice versa. 


The Parseval Relation 

When the functions u(x) and v(x) are Hilbert transforms of each other, given by 
Eqs. (12.118), and each is square integrable,!° the two functions satisfy the scaling condi- 
tion 


J wcoPar= f jooPas. (12.137) 


This is the Parseval relation. 
To derive Eq. (12.127), we start with 








ia r | 1 fr@ds || 1 pod 
t)at 
i |u(x)/?dx = J dx fe . fe , 
a S—X a t—x 
—oo —00 —00 —0o 
using the formula for u(x) from Eq. (12.118) twice. Integrating first with respect to x, we 
have 
CO [o,@) [o,@) [o,@) 
[wcorar= f voas f voary f — (12.128) 
x)\-dx = , : 
u v(s)ds Jv =) hn 
—oo —0o —00 —0o 


where both principal-value limits at the singularities of the integrand must now be taken for 
the x integration. As shown in Exercise 12.8.8, that integration yields a delta function!!: 


1 ra dx 
=f Gopecw 76 i). 





Thus, 
or) or) for) 
i |u(x)|*dx = / v(t)dt / v(s)d(s —t)ds. (12.129) 
—oo —oo —oo 
10This means that foes |u(x)|2dx and Io |v(x)|2dx are finite. 


'Note that when s = t, the integrand has the same sign (for small ¢) at x = s — ¢ and at x = s + ¢, so the limit defining the 
principal value then does not exist. The singularity in the integration is that which is needed to represent a delta function. 
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Then the s integration is carried out by inspection, using the defining property of the delta 
function: 


[e,e) 


/ v(s)5(s — tds = v(t). (12.130) 


—oo 


Substituting Eq. (12.130) into Eq. (12.129), we have Eq. (12.127), the Parseval relation. 
Again, in terms of optics, the presence of refraction over some frequency range (n # 1) 
implies the existence of absorption, and vice versa. 


Exercises 


12.8.1 


12.8.2 


12.8.3 


12.8.4 


Assume that the function f(z) satisfies the conditions for the dispersion relations. In 
addition, assume that f(z) = f*(z*), ie., that it meets the conditions of the Schwarz 
reflection principle, Eq. (11.127). Show that f(z) is identically zero. 


For f(z) such that we may replace the closed contour of the Cauchy integral formula 
by an integral over the real axis we have 


xo—46 ee) 
1 1 
fan IN) / ei) ewe eka 
220i x — x0 x — x0 2mi J x —xo 
—o xo+6 Cc 








Here we take C to be a small semicircle about xo in the lower half-plane. Show that the 
formula for f (xo) reduces to 


CO 


jen f £0 ay, 
1 


X — XQ 





—C 
which is Eq. (12.117). 


(a) The function f(z) = e’* does not vanish at the endpoints of the range of argz, a 
and x. Show, with the help of Jordan’s lemma, Eq. (11.102), that Eq. (12.116) still 
holds. 

(b) For f(z) = e!® verify by direct integration the dispersion relations, Eq. (12.117) 
or Eqs. (12.118). 


With f(x) = u(x) + iv(x) and f(x) = f*(—x), show that as x9 > 00, 
-) CO 
(a) u(xo)~-—5 f xu(x)dx, 
1X6 JO 
o) CO 
(6) v(x0) ~ —— i uae. 
UXO JO 


In quantum mechanics relations of this form are often called sum rules. 





12.8.5 


12.8.6 


12.8.7 


12.8.8 
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(a) Given the integral equation (valid for all real xo) 


1 1 
I= f ao dx, 
l+xy WJ x—X9 
—0o 


use Hilbert transforms to determine u(xQ). 

(b) Verify that the u(xo) found as your answer to part (a) actually satisfies the integral 
equation. 

(c) From f(z) |y=o= u(x) + iv(x), replace x by z and determine f(z). Verify that 
the conditions for the Hilbert transforms are satisfied. 

(d) Are the crossing conditions satisfied? 





ANS. (a) u(x0) = a (c) fQ=(eti7!. 
0 


(a) Ifthe real part of the complex index of refraction (squared) is constant (no optical 
dispersion), show that the imaginary part is zero (no absorption). 

(b) Conversely, if there is absorption, show that there must be dispersion. In other 
words, if the imaginary part of n? — 1 is not zero, show that the real part of n? — 1 
is not constant. 


Given u(x) = x/(x? + 1) and v(x) = —1/(x? + 1), show by direct evaluation of each 


integral that 
Co Co 
J wcoPar =f jwoPas. 
—oo —0o 


Co Cc 

2 2 ud 

ANS. i |u(x)| ax= [ |u(x)|*dx = oa 
—0o —oo 


Take u(x) = 6(x), a delta function, and assume that the Hilbert transform equations 
hold. 


(a) Show that 


(oe) 


we dy 
sw) = fo 


—oo 


(b) With changes of variables w = s —t and x = s — y, transform the 6 representation 
of part (a) into 





1 rs dx 
a 0=af aoe 


Note. The 6 function is discussed in Section 1.11. 
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Additional Readings 


Abramowitz, M., and I. A. Stegun, eds., Handbook of Mathematical Functions with Formulas, Graphs, and 
Mathematical Tables (AMS-55). Washington, DC: National Bureau of Standards (1972), reprinted, Dover 
(1974). 

Lewin, L., Polylogarithms and Associated Functions. New York: North-Holland (1981). This is a definitive 
resource for the dilogarithm and its generalizations up through its publication date. It is clear and no more 
difficult than necessary. 

McBride, E. B., Obtaining Generating Functions. New York: Springer-Verlag (1971). An introduction to meth- 
ods of obtaining generating functions, both for sets of functions arising from ODEs and for those that do 
not. 

Nussenzveig, H. M., Causality and Dispersion Relations, Mathematics in Science and Engineering Series, Vol. 
95. New York: Academic Press (1972). This is an advanced text covering causality and dispersion relations in 
the first chapter and then moving on to develop implications in a variety of areas of theoretical physics. 

Talman, J. D., Special Functions. New York: W. A. Benjamin (1968). Develops the theory of a number of 
special functions using their underlying group-theoretical properties, including presentation of their generating 
functions. 

Wyld, H. W., Mathematical Methods for Physics. Reading, MA: Benjamin/Cummings (1976), Perseus Books 
(1999). This is a relatively advanced text that contains an extensive discussion of dispersion relations. 


13.1 


CHAPTER 13 


GAMMA FUNCTION 


The gamma function is probably the special function that occurs most frequently in the 
discussion of problems in physics. For integer values, as the factorial function, it appears 
in every Taylor expansion. As we shall later see, it also occurs frequently with half-integer 
arguments, and is needed for general nonintegral values in the expansion of many func- 
tions, e.g., Bessel functions of noninteger order. 

It has been shown that the gamma function is one of a general class of functions that 
do not satisfy any differential equation with rational coefficients. Specifically, the gamma 
function is one of very few functions of mathematical physics that do not satisfy either 
the hypergeometric differential equation (Section 18.5) or the confluent hypergeomet- 
ric equation (Section 18.6). Since most physical theories involve quantities governed by 
differential equations, the gamma function (by itself) does not usually describe a physi- 
cal quantity of interest, but rather tends to appear as a factor in expansions of physically 
relevant quantities. 


DEFINITIONS, PROPERTIES 


At least three different convenient definitions of the gamma function are in common use. 
Our first task is to state these definitions, to develop some simple, direct consequences, and 
to show the equivalence of the three forms. 


Infinite Limit (Euler) 


The first definition, named after Euler, is 


[i QOVBv.. 
T(z) = lim 


es 0,-—1,-—2,-3,.... 13.1 
n>00 Z(Z+1)(@4+2):--(Z+n)- Ze ( ) 
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This definition of I'(z) is useful in developing the Weierstrass infinite-product form of 
T(z), Eq. (13.16), and in obtaining the derivative of InI"(z) (Section 13.2). Here and else- 
where in this chapter z may be either real or complex. Replacing z with z + 1, we have 


1. 3s 3eee : 
r(¢+1)= lim e z+1 


mS GEDEEDELS) “ERED 


nz 1:2-3-.-n 
= lim . n 
n>ooz+n+1 z2(z+1)(z+2)---(<+n) 





z 





= 2I(z). (13.2) 


This is the basic functional relation for the gamma function. It should be noted that it is a 
difference equation. 
Also, from the definition, 





rd) = ii eS (13.3) 
“Gao leo anne 

Now, repeated application of Eq. (13.2) gives 

rQ)=1, 

rO=27eje3, 

(4) =31(3) =2-3,_ etce., 
so 

[Wekt4e@sheG—i (13.4) 


Definite Integral (Euler) 


A second definition, also frequently called the Euler integral, and already presented in 
Table 1.2, is 


CO 
Pia f telat, Re(z) > 0. (13.5) 
0 


The restriction on z is necessary to avoid divergence of the integral. When the gamma 
function does appear in physical problems, it is often in this form or some variation, such as 


Cc 
rea? f Petar, Ne(z) > 0, (13.6) 
0 
or 


I 


z—-l 
T(z) = l(>)| dt, MNe(z)>0. (13.7) 


0 
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When z = 5 Eq. (13.6) is just the Gauss error integral, and, cf. Eq. (1.148), we have the 
interesting result 


r()=vm (13.8) 


Generalizations of Eq. (13.6), the Gaussian integrals, are considered in Exercise 13.1.10. 
To show the equivalence of these two definitions, Eqs. (13.1) and (13.5), consider the 
function of two variables 


n 
t n 
Fen)= | (1 - ~) tldt, Re(z)>0, (13.9) 
n 
0 
with n a positive integer. This form was chosen because the exponential has the definition 
t n 
lim (: - ~) =e!” (13.10) 
noo n 


Inserting Eq. (13.10) into Eq. (13.9), we see that the infinite-n limit of F(z, n) corresponds 
to P'(z) as given by Eq. (13.5): 


Cc 
lim Fn) =Fte,00) = f et Ndr = PQ. (13.11) 
n—- oo 

0 


Our remaining task is to identify this limit also with Eq. (13.1). 
Returning to F(z, n), we evaluate it by carrying out successive integrations by parts. For 
convenience we make the substitution u = t/n. Then 
1 
Fen) =n f= uta" du. (13.12) 
0 
The first integration by parts yields 


F(z,n) = 


ne 





(d—uy" 
va 





1 
; n n-1,z 
+ (d—u)" ‘us du; (13.13) 
0 Zz 
0 


note that (because z 4 0) the integrated part vanishes at both endpoints. Repeating this n 
times, with the integrated part vanishing at both endpoints each time, we finally get 


_ ial 
F(z,n)=n* =) uz"! dy 
z(zt+1)---(z+n—1) 

0 





7 1:2-3--n 
~ HIG Dan 





z, (13.14) 


This is identical with the expression on the right side of Eq. (13.1). Hence 
lim F(z,n) = F(z, «) =T(z), 
n—-> Oo 


where I"(z) is in the form given by Eq. (13.1), thereby completing the proof. 
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Infinite Product (Weierstrass) 
The third definition (Weierstrass’ form) is the infinite product 
= OY mas 
ro = I 5) (13.15) 


where y is the Euler-Mascheroni constant 
y = 0.5772156619---, (13.16) 


which was introduced as a limit in Eq. (1.13). Existence of the limit was the topic of 
Exercise 1.2.13. 

This infinite-product form is useful for proving various properties of I'(z). It can be 
derived from the original definition, Eq. (13.1), by rewriting it as 





r(Z) = ki See ae Tan ; (13.17) 
Z)= um n~= hm — — n. : 
n>oo 7(z+1)---(z+n) a ke m 
Taking the reciprocal of Eq. (13.17) and using 
eae, (13.18) 

we obtain 

| 8s jim enn Il (1 es =) (13.19) 

D(z) mie m=1 “ , , 


Multiplying and dividing the right-hand side of Eq. (13.19) by 
ilo | 1 = 
{eee es teow = = aim 13.2 
exp] ( eon +2).| it ; (13.20) 


we get 
Lt . ij 1 1 1 1 \ 
cried Ca A a a 


. ES ll (1+ =) ein. (13.21) 


m=1 


Comparing with Eq. (1.13), we see that the parenthesized quantity in the exponent 
approaches as a limit the Euler-Mascheroni constant, thereby confirming Eq. (13.15). 
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Functional Relations 


In Eq. (13.2) we already obtained the most important functional relation for the gamma 
function, 


T(z+1)=zT (2). (13.22) 


Viewed as a complex-valued function, this formula permits the extension to negative z of 
values obtained via numerical evaluation of the integral representation, Eq. (13.5). While 
the Euler limit formula already tells us that '(z) is an analytic function for all z except 0, 
—1,..., stepwise extrapolation from the integral is a more efficient numerical approach. 

The gamma function satisfies several other functional relations, of which one of the most 
interesting is the reflection formula, 


r2rd-z= 





— (13.23) 
Sin ZIT 


This relation connects (for nonintegral z) values of I'(z) that are related by reflection about 
the line z= 1/2. 
One way to prove the reflection formula starts from the product of Euler integrals, 





CO CO 
(z+ pra =2= f seas feat 
0 0 
Co td Co 
vidv = 
= du. 13.24 
ats fue Uu ( ) 
0 0 


In obtaining the second line of Eq. (13.24) we transformed from the variables s, t to 
u=s+t,v=s/t, as suggested by combining the exponentials and the powers in the 
integrands. We also needed to insert the Jacobian of this transformation, 








1 
pala _stt (tl. 
SS S| ge 8 
t £2 


the final substitution becomes obvious if we note that v + 1 =u/t. 

Returning to Eq. (13.24), the u integration is elementary, being equal to 1!, while 
the v integration can be evaluated by contour-integration methods; it was the topic of 
Exercise 11.8.20, and has the value 


CO zd 
/ — = (13.25) 


(v +1)? ~ sina 
0 


Using these results, and then replacing '(z + 1) in Eq. (13.24) by zI'(z) and canceling z 
from the two sides of the resulting equation, we complete the demonstration of Eq. (13.23). 

A special case of Eq. (13.23) results if we set z = 1/2. Then (taking the positive square 
root), we get 





l(s)=<v2, (13.26) 
in agreement with Eq. (13.8). 
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Another functional relation is Legendre’s duplication formula, 
1 
ratar(c+3)=2%va P(2z+ 1), (13.27) 
which we prove for general z in Section 13.3. However, it is instructive to prove it now for 


integer values of z. Assuming z to be a nonnegative integer n, we start the proof by writing 
T(in+ 1) =n!, [n+ 1) = (2n)!, and 


1) /1\ 1 30 Qn-1]) 21-3 Qn-1) - Qn-1)!! 
r(n+5)=r(5) [5-57 D |-# 2n a 


(13.28) 





where we have used Eq. (13.26) and the double factorial notation first introduced in 
Eggs. (1.75) and (1.76). The double factorial notation is used frequently enough in physics 
applications that a familiarity with it is essential, and will from here on be used without 
comment. Making the further observation that n! = 2~" (2n)!!, Eq. (13.27) follows directly. 

Incidentally, we call attention to the fact that gamma functions with half-integer argu- 
ments appear frequently in physics problems, and Eq. (13.28) shows how to write them in 
closed form. 


Analytic Properties 


The Weierstrass definition shows immediately that '(z) has simple poles at z = 0, —1, —2, 
—3,... and that [['(z)]~! has no poles in the finite complex plane, which means that T(z) 
has no zeros. This behavior may also be seen in Eq. (13.23), if we note that 7/(sin zz) is 
never equal to zero. A plot of I'(z) for real z is shown in Fig. 13.1. We note sign changes 
for each unit interval of negative z, that [(1) = I'(2) = 1, and that the gamma function has 
a minimum between z = | and z = 2, at z9 = 0.46143..., with (zo) = 0.88560.... The 











residues R, at the poles z = —n (n an integer > 0) are 
r(- 1 T(-n +2 
R, = lim (ern +e) ig gig 
e>0 60 —n+eé 630 (-n+ €)(—n+ 1+ 6) 
rd —1)" 
ecco (13.29) 

e>0 (—n+ €)---(€) n! 

showing that the residues alternate in sign, with that at z= —n having magnitude 1/n!. 


Schlaefli Integral 


A contour integral representation of the gamma function that we will find useful in devel- 
oping asymptotic series for the Bessel functions is the Schlaefli integral 


fowa =(e"" = Hr 1), (13.30) 
Cc 
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FIGURE 13.1 Gamma function I(x + 1) for real x. 

















FIGURE 13.2 Gamma function contour. 


where C is the contour shown in Fig. 13.2. This contour integral representation is only 
useful when v is not an integer. For integer v, the integrand is an entire function; both 
sides of Eq. (13.30) vanish and it yields no information. However, for noninteger v, tf = 0 
is a branch point of the integrand and the right-hand side of Eq. (13.30) then evaluates 
to a nonzero result. Note that, unlike the contour representations we considered in earlier 
chapters, the present contour is open; we cannot close it at z = +00 because of the branch 
cut, nor can we close it with a large circle, as e~’ becomes infinite in the limit of large 
negative ft. 

To verify Eq. (13.30), we proceed (for v + 1 > 0) by evaluating the contributions from 
the various parts of the integration path. The integral from oo to +¢ on the real axis yields 
—T(v+ 1), choosing arg(z) = 0. The integral +e to oo (in the fourth quadrant) then yields 
e-"'T(y + 1), the argument of z having increased to 27. Since the circle around the 
origin contributes nothing when v > —1, Eq. (13.30) follows. Now that this equation is 
established, we can deform the contour as desired (providing that we avoid the branch 
point and cut), since there are no other singularities we must avoid. 





606 


Chapter 13 Gamma Function 


It is often convenient to cast Eq. (13.30) into the more symmetrical form 


feteaarem ros 1) sin(vz), (13.31) 
G 


where C can be the contour of Fig. 13.2 or any deformation thereof that encircles the 
origin, does not cross the branch cut, and begins and ends at any points respectively above 
and below the cut for which x = +o. 

The above analysis establishes Eqs. (13.30) and (13.31) for v > —1. However, we note 
that the integral exists for v < —1 as long as we stay away from the origin, and therefore 
it remains valid for all nonintegral v. What we have found is that this contour integral 
representation provides an analytic continuation of the Euler integral, Eq. (13.5), to all 
nonintegral v. 


Factorial Notation 


Our discussion of the gamma function has been presented in terms of the classical notation, 
which was first introduced by Legendre. In an attempt to make a closer correspondence to 
the factorial notation (traditionally used for integers), and to simplify the Euler integral 
representation of the gamma function, Eq. (13.5), some authors have chosen to use the 
notation z! as a synonym for '(z + 1) even when z has an arbitrary complex value. Occa- 
sionally one even encounters Gauss’ notation, [](z), for the factorial function: 


[[@=2=r@t+D. 


Neither the factorial (for nonintegral arguments) nor the Gauss notation are currently 
favored by most serious investigators, and we will not use them in this book. 


Example 73.1.1 = MaxweLL-BOLTZMANN DISTRIBUTION 


In classical statistical mechanics, a state of energy EF is occupied, according to the equa- 
tion of Maxwell-Boltzmann statistics, with a probability proportional to e~#/*", where k 
is Boltzmann’s constant and T is the absolute temperature; it is usual to define 6 = 1/kT 
and to write the probability of occupancy of a state of energy E as p(E) = Ce~*¥. If the 
number of states in a small energy interval dE at energy E is given, using a density distri- 
bution function n(E), as n(E) dE, then the total probability of states at energy E assumes 
the form C n(E)e~?* dE. Under those conditions, the total probability of occupancy in 
any state (namely, unity) must be 


1=c f n(Eye PF dE, (13.32) 


which enables us to set the normalization constant C, and the average energy (E) of such 
a classical system will be 


(E)=¢ f En(Eye Pde. (13.33) 
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For a structureless ideal gas, it can be shown that n(E) is proportional to E'/?, with E, the 
kinetic energy of a gas molecule, in the range (0, co). Then we may find C from 


CO 


3 
1=C EV ¢PE gE cL _ aid or _ 28%” 
p3/2 2p3/2’ Ja’ 








0 


and 





Bo/2 Va ) B2\2 2) 2 


the known value of the average kinetic energy per molecule for a structureless classical 
gas at temperature T. 

In probability theory, the distribution used here is known as a gamma distribution; it 
is further discussed in Chapter 23. - 


ns 5 3/2 
(Ey =c f Ee Pa = Co -(7 ) vi (; 5)= 3 ir, 
0 


Exercises 


13.1.1 


13.1.2 


13.1.3 


Derive the recurrence relations 
Pet)=2@ 
from the Euler integral, Eq. (13.5), 
00 
TZ) = | et di: 
0 


In a power-series solution for the Legendre functions of the second kind we encounter 
the expression 


(n+ 1)(n+2)(n+3)--- (n+ 2s — 1)(n+ 2s) 
2:4-6-8--- (2s —2)(2s) - (2n +3)(2n + 5)(2n +7)-+-(2n+ 2s 41)’ 
in which s is a positive integer. 





(a) Rewrite this expression in terms of factorials. 


(b) Rewrite this expression using Pochhammer symbols; see Eq. (1.72). 


Show that I'(z) may be written 
[o,@) 
l@)=2 / et Plt, Me(z) >0, 


0 
1 


z—-1 
r= |in(7)| dt, Ne(z)>0. 
0 
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13.1.4 


13.1.5 


13.1.6 


13.1.7 


13.1.8 
13.1.9 
13.1.10 


13.1.11 


In a Maxwellian distribution the fraction of particles of mass m with speed between v 


and v + dv is 
dN 3/2 2 
—=4n7 (—— ) exp a v’ dv, 
N 2nkT 2kT 


where JN is the total number of particles, k is Boltzmann’s constant, and T is the 
absolute temperature. The average or expectation value of v” is defined as (v”) = 
N~! fv"dN. Show that 





n/2 n+3 
n= (AEE 
ut M5) 


This is an extension of Example 13.1.1, in which the distribution was in kinetic energy 
E =mv*/2, with dE =mvav. 


By transforming the integral into a gamma function, show that 





1 
k 1 
— | x*Inx dx= ; 
(k +1)? 
0 


Show that 


Show that 
(ax) _ 1 


im =-. 
x30 T(x) a 





Locate the poles of '(z). Show that they are simple poles and determine the residues. 
Show that the equation (x) =k, k #0, has an infinite number of real roots. 


Show that, for integer s, 
° ! 
s! 
(a) / x°5+] exp(—ax?)dx = rs 
0 


oe) 
Tis+35)  (2s—1!! [x 
2s 2 — 2f 
(b) J: * exp(—ax )dx = Vast 1/2 = Qs+l1qs a. 
0 


These Gaussian integrals are of major importance in statistical mechanics. 
Express the coefficient of the nth term of the expansion of (1 + x)!/* in powers of x 





(a) in terms of factorials of integers, 


(b) in terms of the double factorial (!!) functions. 





13.1.12 


13.1.13 


13.1.14 


13.1.15 


13.1.16 


13.1.17 


13.1.18 
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(Q2n-3)) eet (2n — 3)! 


ANS. ay = (-1)"*! 
S. a (-1) 22n-2n!(n — 2)! (2n)!! 


PS sis eee 


Express the coefficient of the nth term of the expansion of (1 + x)~!/? in powers of x 


(a) in terms of the factorials of integers, 


(b) in terms of the double factorial (!!) functions. 








(2n)! (2n —1)!! 
= n _ n = 
ANS. an = (-1) nn? — ‘ 1) Onl” ee ae eee 
The Legendre polynomial P,, may be written as 
9 _ 7 @n- Di P 1 n 26 
P, (cos @) = Onl cosné + iT O94 cos(n ) 
1- -1 
2 ao ) cos(n — 4)0 





1-2 Qn — 1)(2n — 3) 


1-3-5 n(n — 1)(n — 2) 
1-2-3 Qn —1)(2Qn — 3)(2n — 5) 








cos(n 0+]. 


Let n = 2s + 1. Then the above can be written 


S 
P, (cos0) = P541(cos@) = S Am cos(2m + 1)6. 


m=0 
Find a,» in terms of factorials and double factorials. 
(a) Show that (5 = n) r (5 + n) = (—1)"z, where n is an integer. 


(b) Express P (5 +7) and (4 —n) separately in terms of '/? and a double factorial 
function. 


Qn=1! ap 


ANS. T(4+n)= a 


Show that if [(x + iy) =u -+iv, then (x —iy) =u —iv. 
This is a special case of the Schwarz reflection principle, Section 11.10. 


B —1/2 
(a+ =| ; 





Prove that |'(w + i8)| =|'@)| | | + 
n=0 


This equation has been useful in calculations of beta decay theory. 


Show that for n, a positive integer, 


mb \'Po 2) p2\1/2 
Tr ib+1)|= | | b : 
UES Vere) (saz) 11's vo 





Show that for all real values of x and y, |['(x)| => |[((@ +iy)|. 
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13.1.19 


13.1.20 


13.1.21 


13.1.22 


13.1.23 


Show that |(P(5 +iy)|? = 





coshzy 


The probability density associated with the normal distribution of statistics is given by 


1 (x — py)? 
fO) = Taqyin exp | Qo? 


with (—oo, oo) for the range of x. Show that 





(a) (x), the mean value of x, is equal to ju, 
(b) the standard deviation ((x?) — (x)*)!/ is given by o. 


For the gamma distribution 
1 
f(xy = 4 BPP @) 
0, x <0, 


x%1e-/B x > 0, 





show that 


(a) (x), the mean value of x, is equal to af, 
(b) 7, its variance, defined as (x*) — (x), has the value w?. 


The wave function of a particle scattered by a Coulomb potential is w(r, 8). Given that 
at the origin the wave function becomes 


vO) =e”? TU +iy), 
where y > 0 is a dimensionless parameter, show that 


2ry 
2: 
WO? =a. 


Derive the contour integral representation of Eq. (13.31), 


2iT (v + 1) sinva = fener. 
Cc 


13.2 DIGAMMA AND POLYGAMMA FUNCTIONS 


Digamma Function 


As may be noted from the three definitions in Section 13.1, it is inconvenient to deal with 
the derivatives of the gamma function directly. It is more productive to take the natural 
logarithm of the gamma function as given by Eq. (13.1), thereby converting the product 
to a sum, and then to differentiate. The most useful results are obtained if we start with 
T(izt+1): 
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n! 





Ti+ 1) = zI(z) = lim G+ DG +2)- — Gen? (13.34) 


Int(zt+)D= lim | Inn!) + zInn — In(z+ 1) 
ee ee = Inet], (13.35) 


in which the logarithm of the limit is equal to the limit of the logarithm. Differentiating 
with respect to z, we obtain 





d é 1 1 1 
Sire + Ds We+) = lim (Inn a. oe =) (13.36) 
which defines (z+ 1), the digamma function. Note that this definition also corre- 
sponds to 
[Ti+ Dy 
z+1)= —— .. 13.37 
v( ) F@tD ( ) 


To bring Eq. (13.36) to a better form, we add and subtract the harmonic number 


ae 
Hn= 0: 


m=1 





thereby obtaining 
n 1 1 
v@+1= lim nn An) — Do (= *)| 
m=1 
~ z 
ee? = 13.38 
y 2 ie a (13.38) 


We have now arranged the contributions in a way that causes each group of terms to 
approach a finite limit as n — oo: in that limit Inn — H, became (minus) the Euler- 
Mascheroni constant, defined in Eq. (1.13), and the summation is convergent. 

Setting z = 0, we find! 


w(1) =—-y =—0.577 215 664 901---. (13.39) 


For integer n > 0, Eq. (13.38) reduces to a form that is good for revealing its structure but 
less desirable for actual computation: 


n 


Wnt D=-7+hn=-yt Om. (13.40) 


m=1 


ly has been computed to 1271 places by D. E. Knuth, Math. Comput. 16: 275 (1962), and to 3566 decimal places by D. W. 
Sweeney, ibid. 17: 170 (1963). It may be of interest that the fraction 228/395 gives y accurate to six places. 
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Polygamma Function 


The digamma function may be differentiated repeatedly, giving rise to the polygamma 
function: 


(mn) qmtl 
m = 
y™C+D= bret y 


CO 
1 
= +1 y = 
=(-1)” ml! (@+nyett’ m=l1, 2, cere (13.41) 
n=l 


Plots of [(x), w(x), and w’(x) are presented in Fig. 13.3. 
If we set z = 0 in Eq. (13.41), the series in that equation is that defining the Riemann 
zeta function,” 


(oe) 


1 
cm) =) (13.42) 


n=1 
and we have 
v™M a1) =(-l)"* mic(m+ 1), ~m=1,2,3,.... (13.43) 


The values of polygamma functions of the positive integral argument, yw” (n + 1), may 
be calculated recursively; see Exercise 13.2.8. 














FiGuRE 13.3 Gamma function and its first two logarithmic derivatives. 


2For z # 0 this series has been used to define a generalization of ¢(m) known as the Hurwitz zeta function. 





13.2 Digamma and Polygamma Functions 613 
Maclaurin Expansion 


It is now possible to write a Maclaurin expansion for InI’(z + 1): 


Int(@+1)= >> — w?YPQ)=-yz+ Lear— c(n). (13.44) 


n=1 n=2 


This expansion is convergent for |z| < 1; for z = x, the range is —1 < x < 1. Alternate 
forms of this series appear in Exercise 13.2.2. Equation (13.44) is a possible means of 
computing (z+ 1) for real or complex z, but Stirling’s series (Section 13.4) is usually 
better, and in addition, an excellent table of values of the gamma function for complex 
arguments based on the use of Stirling’s series and the functional relation, Eq. (13.22), is 
now available.* 


Series Summation 


The digamma and polygamma functions may also be used in summing series. If the general 
term of the series has the form of a rational fraction (with the highest power of the index in 
the numerator at least two less than the highest power of the index in the denominator), it 
may be transformed by the method of partial fractions; see Eq. (1.83). This transformation 
permits the infinite series to be expressed as a finite sum of digamma and polygamma 
functions. The usefulness of this method depends on the availability of tables of digamma 
and polygamma functions. Such tables and examples of series summation are given in 
AMS-S55, chapter 6 (see Additional Readings for the reference). 


Example 13.2.1. — CATALAN’s CONSTANT 


Catalan’s constant, 6(2), Eq. (12.65), is given by 


CO 


K=BQ2)=)) 


k=0 


(-D* 
(2k + 1)?" 


Grouping the positive and negative terms separately and starting with the unit index, to 
match the form of yw“, Eq. (13.41), we obtain 





= 1 1 
K=1 ; 
oD (4n+1)2 9 s (4n + 3)2 


n=1 
Now, identifying the summations in terms of “!), we get 
8 1 


1 1 3 
K=-+—y(14+-)-=yw(14=). 
ae Ch (1+3) 16” "a 


3 Table of the Gamma Function for Complex Arguments, Applied Mathematics Series No. 34. Washington, DC: National Bureau 
of Standards (1954). 
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Using the values of y“ from Table 6.1 of AMS-55 (see Additional Readings for the 
reference), we obtain 


K =0.91596559.... 


Compare this calculation of Catalan’s constant with those carried out in earlier chapters 











(Exercises 1.1.12 and 12.4.4). Py 
Exercises 
13.2.1 For “small” values of x, 
CO 
o(n) 
InT =— —1)" Ue 
@+D=—yxt D(x", 
n=2 
where y is the Euler-Mascheroni constant and ¢(n) the Riemann zeta function. For what 
values of x does this series converge? 
ANS. —1<x<l. 
Note that if x = 1, we obtain 
[o,@) 
¢(n) 
= yn 
y=)", 
n=2 
a series for the Euler-Mascheroni constant. The convergence of this series is exceed- 
ingly slow. For actual computation of y, other, indirect, approaches are far superior 
(see Exercise 12.3.2). 
13.2.2 Show that the series expansion of InI\(x + 1) (Exercise 13.2.1) may be written as 
UX €(2n+ 1) antl, 
InP(a+)= 5! ( )-y nm 
ee sin wx Do 2n+ 1 
1 
(bo) InP(x+)D= 5 In( a -)-5 w( #8) y)x 
sin vx 1- 
oo 2n+l 
= 2n +1) 1 a 
pS [«« en ee 
n=1 
Determine the range of convergence of each of these expressions. 
13.2.3. Verify that for n, a positive integer, the following two forms of the digamma function 


are equal to each other: 


n CO 


1 
uel) Doo % and y(n+1) ~Lijesn y. 


j=l j=l 





13.2.4 


13.2.5 


13.2.6 


13.2.7 


13.2.8 


13.2.9 
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Show that y(z + 1) has the series expansion 


v@+D=-y~t CDM" 1 


n=2 


For a power-series expansion of InI'(z + 1), AMS-55 (see Additional Readings for the 
reference) lists 


le) 
n 
Zz 


Inte +1)=—Ind +2) +20 -y) + OE D"| ge) -1] =. 


n 
n=2 


(a) Show that this agrees with Eq. (13.44) for |z| < 1. 


(b) What is the range of convergence of this new expression? 


Show that 








1 mz cae a ae 
| = A 1. 
2 n( ==) 2 2n 6 aS 


n=1 


Hint. Use Eqs. (13.23) and (13.35). 


Write out a Weierstrass infinite-product definition of InI'(z + 1). Without differentiat- 
ing, show that this leads directly to the Maclaurin expansion of InI’(z + 1), Eq. (13.44). 


Derive the difference relation for the polygamma function, 


wz +2) = WME +1) + (-1)” m=0,1,2,.... 


m! 
(z+ Lyntl i 
The Pochhammer symbol (a), is defined (for integral n) as 


(@)n=a(a+1)---@+n-1), (@o=l. 


(a) Express (a), in terms of factorials. 


(b) Find (d/da)(a), in terms of (a), and digamma functions. 


d 
ANS. qa On = (a)nlW(a+n)— w(a)]. 


(c) Show that 


(Antk = (4a +n)g- (An. 
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13.2.10 


13.2.11 


13.2.12 


13.2.13 


13.2.14 


13.2.15 


Verify the following special values of the y form of the digamma and polygamma 
functions: 


vWD=-y, vPM=c2), wd) =-2¢6). 
Verify: 


[o,@) 
(a) pe Inr dr=—y. 
0 


(oe) 
(b) renin dr=1-y. 
0 


[oe CO 
(c) frerinr dr =(n— ttn f elena dr, n=1,2,3,.... 
0 0 


Hint. These may be verified by integration by parts, or by differentiating the Euler 
integral formula for [(n + 1) with respect to n. 


Dirac relativistic wave functions for hydrogen involve factors such as T'[2(1 — 
a” Z)'/2 + 1] where a, the fine structure constant, is 1/137 and Z is the atomic number. 
Expand I'[2(1 — a@7Z7)!/? + 1] ina series of powers of a” Z?. 


The quantum mechanical description of a particle in a Coulomb field requires a knowl- 
edge of the argument of I'(z) when z is complex. Determine the argument of "(1 + ib) 
for small, real b. 


Using digamma and polygamma functions, sum the series 
ae ate | 
——., (b _——. 
@) 256 =y) °) as 


Note. You can use Exercise 13.2.8 to calculate the needed digamma functions. 


Show that 





= 1 1 
re rain Gaal vate —watay], 


n=1 


where a # b, and neither a nor b is a negative integer. It is of some interest to compare 
this summation with the corresponding integral, 





fax oi [ na +5 In(l + )| 
laxéesp 7 a le 
1 


The relation between y (x) and Inx is made explicit in the analysis leading to Stirling’s 
formula. 





13.3 


13.3 The Beta Function 617 
THE BETA FUNCTION 


Products of gamma functions can be identified as describing an important class of definite 
integrals involving powers of sine and cosine functions, and these integrals, in turn, can 
be further manipulated to evaluate a large number of algebraic definite integrals. These 
properties make it useful to define the beta function, defined as 


_Tor@ 
(p+) 
For whatever it is worth, note that the B in Eq. (13.45) is an upper-case beta. 


To understand the virtue of this definition, let us write the product ['(p)I'(q) using the 
integral representation given as Eq. (13.6), valid for Ne(p), Nte(qg) > 0: 


B(p,q) (13.45) 


Co CO 
rip rig) =4 f s?Pte* ao a dt. (13.46) 
0 0 


The reason for using this integral representation is that the quadratic terms in the exponent, 
s? and t?, combine in a convenient way if we change the integration variables from s, t 
to polar coordinates r,@, with s =rcos@, t =rsin@, r* =s*+¢*, and dsdt =rdrdo. 
Equation (13.46) becomes 


ve m/2 
'(p) Pq) =4 / p2Ptda—le?” dp / cos’?—'@ sin*4—'6 do 
0 
m/2 


=20(p +4) f cos??"6 sin-?—'6 d6, 
0 


where we have used Eq. (13.6) to recognize the r integration as [(p + q). This gives us 
our first integral evaluation based on the beta function: 
x/2 
B(p,q) =2 cos??—!@ sin24-!6 dé. (13.47) 
0 
Because Eq. (13.47) is often used when p and q are integers, we rewrite for the case 
p=m+l,qg=n+1, 
x/2 
2 / cos””"+19 sin?"+19 do. (13.48) 
0 


m!n! 
(m+tn+1)! _ 


Because gamma functions of a half-integral argument are available in closed form, 
Eq. (13.47) also provides a route to these trigonometric integrals for even powers of the 
sine and/or cosine. Note also that from its definition it is obvious that B(p, q) = B(q, p), 
showing that the integral in Eq. (13.47) does not change in value if the powers of the sine 
and cosine are interchanged. 
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Alternate Forms, Definite Integrals 


The substitution t = cos” 6 converts Eq. (13.47) to 
1 
Bop lg+t)= f raat (13.49) 
0 
Replacing t by x”, we obtain 
1 
B(p=1.q-+1)=2 f xP — x?) de. (13.50) 
0 
The substitution t = u/(1 + u) in Eq. (13.49) yields still another useful form, 


CO 
uP 
0 


The beta function as a definite integral is useful in establishing integral representations of 
the Bessel function (Exercise 14.1.17) and the hypergeometric function (Exercise 18.5.12). 


Derivation of Legendre Duplication Formula 


The Legendre duplication formula involves products of gamma functions, which suggests 
that the beta function may provide a useful route to its proof. We start by using Eq. (13.49) 
for B(z+ 5,243): 


1 

I 1 

B (: oe 5) = } Ped apr ae. (13.52) 
0 


Making the substitution t = (1 + s)/2, we have 


1 
1 1 ; ; 
B (: veer ;) —2-% [a =F ds 
—1 


1 
: 1 1 
= 72241 [oa — 52) 1/2 ds = 2-2 B (5 z+ ;) , (13.53) 
0 


where we used the fact that the s integrand was even to change the integration range to 
(0, 1), and then used Eq. (13.50) to evaluate the resulting integral. Now, inserting the 
definition, Eq. (13.45), for both instances of B in Eq. (13.53), we reach 


Tiz+5)0c+4) he ry ret4) 
TQz+) T(z +1) 
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which is easily rearranged into 


Pz+Dr (<+5)- VE ree+h), (13.54) 


the Legendre duplication formula, originally introduced as Eq. (13.27), but proved at that 
time only for integer values of z. 

Although the integrals used in this derivation are defined only for Ne(z) > —1, the 
result, Eq. (13.54), holds, by analytic continuation, for all z where the gamma functions 
are analytic. 


Exercises 
13.3.1. Verify the following beta function identities: 


(a) B(a,b) = Bla +1,b) + Bla,b +1), 


(b) Ba, b)= te Ba, b+1), 


(c) Bla, b= °—* Bia +1,b—-1), 
(d) B(a,b)Bia+b,c) = Bib, c)B(a,b+c). 


13.3.2 (a) Show that 


/2, n=0 
fo- x) /2x 2n dx = Cie 1D! 
ne HSN 2 Bsc 
v Ont DN’ 
(b) Show that 
I, n=0, 
[o- ey P=). On 1yit ae 
7 nyt” n=1, jg nick's 


13.3.3 Show that 


1 
2.(2n)!! 
2 = 
fo-s Y= oii Dn n=0,1,2,.... 


1 
13.3.4 Evaluate / (d4+x)*(1- x)? dx in terms of the beta function. 


—-1 


ANS. 2¢+°+!B(a+1,b+1). 
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13.3.5 Show, by means of the beta function, that 





z 
d 
7 a =- Z », O<a<l. 
(z—x)!-@(x — 1)" — sinza 
t 
13.3.6 | Show that the Dirichlet integral 
!q! B 1 1 
[ [st axay= pig _ Bert eet) 
(p+q+2)! pt+qt+2 


where the range of integration is the triangle bounded by the positive x- and y-axes and 
the linex + y=1. 


13.3.7. Show that 








2sind 


Cg 


ee) 
fe eee dx dy = 
0 


What are the limits on 6? 
Hint. Consider oblique x y-coordinates. 
ANS. -1 <0 <7. 


13.3.8 Evaluate (using the beta function) 


m/2 
3/2 
(a) [ c0st”?0a0 = ate, 
16(' (5/4) |? 
0 
m/2 n/2 | 
(b) [ costo do = / sin" do = VEL@ = D/2I! 
2(n/2)! 
0 0 
— for n odd, 
— n!! 
~ |x MD! 
— . ————_ for n even. 
2 n!! 


1 
13.3.9 Evaluate i (1 — x*)~!/?dx as a beta function. 
0 
[r(5/4)]? -4 


ANS. 
(27) 1/2 


= 1.311028777. 


13.3.10 Using beta functions, show that the integral representation 
m/2 


Jy (Z) = ae (5) / sin?’ 6 cos(zcos@)d6, K(v)>—4, 
m/2P(v+ 4) \2 
0 





13.3.11 


13.3.12 


13.3.13 


13.3.14 
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reduces to the Bessel series 


i : 1 zZ\2s+v 
a= Dl ) ae 


thereby confirming its validity. 





Given the associated Legendre function, defined in Chapter 15, 


PM (x) = (2m — 1)! (1 — x2yn/2, 





show that 
2 
(a) [lemoor ax co re (2m)!, m=0,1,2,..., 
=i 


dx 


1 
(b) [tercor 5=2-Qm-1)!, m=1,2,3,.... 
1-—x 
-1 





Show that, for integers p and q, 


1 
_ (2p)!! 
Spl 1 
(a) pee (1 — x“) ax= Gee DI’ 


1 
4 [rra _ yt gy = OP =D Cad 
y (2p+2q +1)! 


A particle of mass m moving in a symmetric potential that is well described by V(x) = 
A|x|” has a total energy 5m(dx /dt)? + V(x) = E. Solving for dx/dt and integrating 
we find that the period of motion is 


Xmax 


dx 
pain | pea 
0 


where Xmax is a classical turning point given by Ax/.,, = E. Show that 


ae 2 [2m (f)" r(1/n) 
~ nV E \AS Prajn+3) 


Referring to Exercise 13.3.13, 





(a) Determine the limit as n > oo of 


2 /2xm (f)" r(1/n) 
nV E \AJ T/n+35) 
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13.3.15 


13.3.16 


13.3.17 


(b) Find lim,_,. t from the behavior of the integrand (E — Ax”)7!/?. 


(c) Investigate the behavior of the physical system (potential well) as n > oo. Obtain 
the period from inspection of this limiting physical system. 


Show that 








CO 
sinh® x 1 a+l B- “) 
dx==B ; ; l<a<f. 
| cosh? x 2 ( 2 2 P 
0 
Hint. Let sinh? x = u. 


The beta distribution of probability theory has a probability density 


(a+ B) 
P(a) (6) 


with x restricted to the interval (0, 1). Show that 


f@= x*!q —x)F1, 


(a) (x), the mean value, is 





a 
a+ Bp 
(b) o”, its variance, is (x7) — (x)? 


= op 
© (a+ B+ B +1)" 








From 


w/2 
i sin” 6 dO 
lim —2 =l; 
noo w/ 
/ sin2”+! 6 do 
0 





derive the Wallis formula for z: 





13.4 STIRLING’S SERIES 


In statistical mechanics we encounter the need to evaluate In(n!) for very large values of 
n, and we occasionally need InI(z) for nonintegral z when |z| is large enough that it is 
inconvenient or impractical to use the Maclaurin series, Eq. (13.44), possibly followed 
by repeated use of the functional relation '(z + 1) = zI'(z). These needs can be met by 
the asymptotic expansion for InI'(z) known as Stirling’s series or Stirling’s formula. 
While it is in principle possible to develop such an asymptotic formula by the method of 
steepest descents (and in fact we have already obtained the leading term of the expansion 
in this way; see Example 12.7.1), a relatively simple way of obtaining the full asymptotic 
expansion is by use of the Euler-Maclaurin integration formula in Section 12.3. 
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Derivation from Euler-Maclaurin Integration Formula 


The Euler-Maclaurin formula for evaluating a definite integral on the range (0,00), 
obtained by specializing Eq. (12.57) and ignoring the remainder, is 


[ ferdx=370 + FO) + 1+ FO+- 
0 


B B B 
taf Ot FOO te Fae (13.55) 
where B,, are Bernoulli numbers: 
B : Ba= : Bo= : Bg = : 
2 —— 6 ’ 4 = 30 ’ 6 42 ’ 8 30 ’ 


We proceed by applying Eq. (13.55) to the definite integral 


Co 

/ dx a! 
(z+x)? z 

0 


(for z not on the negative real axis). We note, by comparing with Eq. (13.41), that 


CO 


1 
oe ase) 
f()+ f(2)+ =e Gane =w%z+)); 





this makes a connection to the gamma function and is the reason for our current strategy. 
We also note that 











2n—1 
f"-D@) = d\" 1 Qn)! 
dx (z+x)2 x0 z2ntl , 
so the expansion yields 
(1) ae Fe 8. 
‘=| ene gg TV GFN a 


Solving for w“)(z + 1), we have 





GS ype eis oe 
‘ wae ~ age gt gs 
1 Cc 


=i-a +a inet (339) 


Since the Bernoulli numbers diverge strongly, this series does not converge. It is a semi- 
convergent, or asymptotic, series, useful if one retains a small number of terms (compare 
with Section 12.6). 
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Integrating once, we get the digamma function 








jet sines 2) ™ 
= n 
. : T9722 a 
1 cat Bon 
=C,+Inz+ =" > aa (13.57) 


n=1 


where C, has a value still to be determined. In the next subsection we will show that 
C,; = 0. Equation (13.57), then, gives us another expression for the digamma function, 
often more useful than Eq. (13.38) or Eq. (13.44). 


Stirling’s Formula 


The indefinite integral of the digamma function, obtained by integrating Eq. (13.57), is 


Bon 


In(2n—1)z2"-1 apt 


(13.58) 


1 B 
InP D=Or+ (2+ 5)ine + Ci— Det ett 


in which C2 is another constant of integration. We are now ready to determine C; and 
C2, which we can do by requiring that the asymptotic expansion be consistent with the 
Legendre duplication formula, Eq. (13.54). Substituting Eq. (13.58) into the logarithm of 
the duplication formula, we find that satisfaction of that formula dictates that C; = 0 and 
that Cz must have the value 


Cy = 5 1n2z. (13.59) 
Thus, inserting also values of the Bo, our final result is 


1 gga 
12z 360z3_ ~—-1260z° 





1 1 
InPe+ = 5 Inde + (2+ 5) nz z+ (13.60) 
This is Stirling’s series, an asymptotic expansion. The absolute value of the error is less 
than the absolute value of the first term neglected. 
The leading term in the asymptotic behavior of the gamma function was one of the ex- 
amples used to illustrate the method of steepest descents. In Example 12.7.1, we found that 


Tizt)r~ lI Zetl/2g-% 


corresponding to 
1 1 
Ine Lys 6 nen its Inz—z, 


yielding all the terms of Eq. (13.60) that do not vanish in the limit of large |z|. 

To help convey a feeling of the remarkable precision of Stirling’s series for ['(s + 1), 
the ratio of the first term of Stirling’s approximation to '(s + 1) is plotted in Fig. 13.4. In 
Table 13.1 we give the ratio of the first term in the expansion to '(s + 1) and a similar 
ratio when two terms are kept in the expansion to I'(s + 1). The derivation of these forms 
is Exercise 13.4.1. 





Exercises 


13.4.1 


13.4.2 


13.4.3 
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| 0.83% low 


1 
vege ee (Ite) 
T(s+1) 














FIGURE 13.4 Accuracy of Stirling’s formula. 


Table 13.1 Ratios of One- and Two-Term Stirling Series to Exact 
Values of C'(s + 1) 





1 








Vagsstll2—-s | Pas8tl/2—-s (1 Z =) 





T(s + 1) T'(s + 1) 12s 
1 0.92213 0.99898 
2 0.95950 0.99949 
3 0.97270 0.99972 
4 0.97942 0.99983 
5 0.98349 0.99988 
6 0.98621 0.99992 
7 0.98817 0.99994 
8 0.98964 0.99995 
9 0.99078 0.99996 
10 0.99170 0.99998 





Rewrite Stirling’s series to give ['(z + 1) instead of InI'(z + 1). 


1 1 139 
ANS. T 1) = VJ 2m Z2t!/26-2 (1 vie Ny 
Ge) ee Nr ig, O8ee? 51,8400 





Use Stirling’s formula to estimate 52!, the number of possible rearrangements of cards 
in a standard deck of playing cards. 


Show that the constants C; and C2 in Stirling’s formula have the respective values zero 
and In 2z by using the logarithm of the Legendre duplication formula (see Fig. 3.4). 
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13.4.4 


13.4.5 


13.4.6 


13.4.7 


13.4.8 


13.5 


Without using Stirling’s series show that 
n+l n 

(a) In(n!) < i Inxdx, (b) In(m!)> [insas: n is an integer > 2. 
1 1 


Note that the arithmetic mean of these two integrals gives a good approximation for 
Stirling’s series. 


Test for convergence 








> [Aes by 2p+1 sey (2p —1!! (2p + D! 


=| # ea (2p)! (2p +2)! 


This series arises in an attempt to describe the magnetic field created by and enclosed 
by a current loop. 
r 1 
Show that lim pee rer = 
x00 T(ix+b+1) 


2n —1)!! 
Show that lim SD gue 
n>oo (2n)!! 


A set of N distinguishable particles is assigned to states Wj, i = 1, 2,..., M. If the 
numbers of particles in the various states are nj, 2,...,nNy (with M < N), the number 
of ways this can be done is 
N! 
~ nytng!---ny! 
The entropy associated with this assignment is S = klnW, where k is Boltzmann’s 


constant. In the limit N — oo, with nj = p;N (so p; is the fraction of the particles in 
state 7), find S as a function of N and the p;. 


(a) In the limit of large N, find the entropy associated with an arbitrary set of n;. Is 
the entropy an extensive function of the system size (i.e., is it proportional to NV’)? 


(b) Find the set of p; that maximize S. 


Hint. Remember that 5°; pj = 1 and that this is a constrained maximization (see 
Section 22.3). 


Note. These formulas correspond to classical, or Boltzmann, statistics. 


RIEMANN ZETA FUNCTION 


We are now in a position to broaden our earlier survey of ¢(z), the Riemann zeta function. 
In so doing, we note an interesting degree of parallelism between some of the properties of 
¢(z) and corresponding properties of the gamma function. 
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We open this section by repeating the definition of ¢(z), which is valid when the series 
converges: 


cg=)on%. (13.61) 
n=1 


The values of ¢(n) for integral n from 2 to 10 were listed in Table 1.1 on page 17. 

We now want to consider the possibility of analytically continuing ¢(z) beyond the 
range of convergence of Eq. (13.61). As a first step toward doing so, we prove the integral 
representation that was given in Table 1.1: 





ae ae 13.62 
SO= FD a i (13.62) 
0 


Equation (13.62) has a range of validity that is limited by the behavior of its integrand at 
small t; since the denominator then approaches f, the overall small-t dependence is t?~*. 
Writing z =x +iy and t?-? = ¢*~7e!)™", we see that, like Eq. (13.61), Eq. (13.62) will 
only converge when Ne z > 1. 

We start from the right-hand side of Eq. (13.62), denoted J, by multiplying the numer- 
ator and denominator of its integrand by e~‘ and expanding the denominator in powers of 
e‘, reaching 





7 ele m 00 
ga 1 [- edt = 1 ee 
T(z) l-e T(z) 
0 ot 


We next change the variable of integration for the individual terms so that all terms contain 


an identical factor e~’: 


i re dt\ 1 (ol fis 
soy Pc, e (B) =r (2m) f edt 
Qo m= 


m= 1 





ee) 


1 z—-l-t _ 
To t* e ‘dt=Z(z). (13.63) 
0 


= ¢(z) 
In the second line of Eq. (13.63) we recognize the summation as a zeta function and the 
integral as the Euler integral representation of I'(z), Eq. (13.5). It then cancels against the 
initial factor 1/ T(z), leaving the desired final result, Eq. (13.62). In passing, we note that 
the only difference between the integral of Eq. (13.62) and the Euler integral for the gamma 
function is that we now have a denominator e’ — 1 instead of simply e’. 

The next step toward the analytic continuation we seek is to introduce a contour integral 
with the same integrand as Eq. (13.62), using the same open contour that was found useful 
for the gamma function, shown in Fig. 13.2. Just as for the gamma function, we do not 
wish to restrict z to integral values, so the integrand will in general have a branch point 
at t = 0, and again we have placed the branch cut on the positive real axis. Restricting 
consideration for now to z with Ite z > 1, we evaluate the contour integral, denoted J, as 
the sum of its contributions from the sections of the contour, respectively, labeled A, B, 
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and D in Fig. 13.2. For Ne z > 1, the small circle D makes no contribution to the integral, 
while 
Lo prla 
t** dt 
rT = = 5 
a T@ i rs 
CO 


ee sf feo 1 ae =e MC—De(z) = 6? !2¢(z), 
P(z) 








Combining the above, we get 


at (ene 
=F ral e = 1) t(z). (13.64) 


Note that Eq. (13.64) is useful as a relation involving ¢(z) only if z is not an integer. 

We now wish to deform the contour of Eq. (13.64) in a way that will remove the 
restriction te z > 1, which we originally needed to obtain that equation. The deforma- 
tion corresponds to an analytic continuation of ¢(z) to a larger range of z, and will be 
effective because the deformation can avoid the divergence in the neighborhood of tf = 0. 
When we consider possible deformations, we need to make the observation that, unlike 
the gamma function, the integrand of Eq. (13.64) has simple poles at the points t = 2nzi, 
n=+1,+2,...,s0 that if we deform the contour in a way that encloses any of these poles, 
we must allow for the change thereby produced in the value of the contour integral. 

If we initially deform the contour by expanding the circle D to some finite radius less 
than 277, we do not change the value of the integral J but extend its range of validity to 
negative z. If, for z < 0, we further expand D until it becomes an open circle of infinite 
radius (but not through any of the poles), the value of the contour integral is reduced to 
zero, with the change caused by the inclusion of the contribution from the poles that are 
then encircled. We therefore have the interesting result that the original contour integral 
had a value that was the negative of 277i times the sum of the residues that were newly 
enclosed. Thus, 


I= (om i 1) f(z) = 




















> (residues of t*~!/(e! — 1) at t = +2nzi). 
n= 


Fe) ) 
At the pole t = +2zni, the residue is (Qnxeri/2)* , while at t = —2zni it is 
(2ne3""/ ayer Note that we must evaluate the residues taking cognizance of the branch 


cut. Inserting these values and rearranging a bit, 


(er _ 1) f= (>: | _— (ee? 4 drte—1)/2) 
n=1 


= 2 m)* 3riz/2 _ em [2 
=<c(1 ea G (c z ). (13.65) 


Note that because z < 0, the summation over n converges and can be identified as ¢(1 — z). 
Equation (13.65) can be simplified, but we already see its essential feature, namely that it 
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provides a functional relation connecting ¢(z) and ¢(1 — z), parallel to but more compli- 
cated than the reflection formula for the gamma function, Eq. (13.23). The derivation of 
Eq. (13.65) was carried out for z < 0, but now that we have obtained it, we can, appeal- 
ing to analytic continuation, assert its validity for all z such that its constituent factors are 
nonsingular. This formula, in the simplified form we shall shortly obtain, was first found 
by Riemann. 

The simplification of Eq. (13.65) can be accomplished by recognizing, with the aid of 
the gamma-function reflection formula, Eq. (13.23), that 


ertiz/2 — eti2/2 — sin(az/2) ss T@)Td—-2) 











e2miz — | sinnz  T(z/2)Td —z/2)’ 
so 
m= 22? T(1—z) m?=—l/2T((1 — z)/2) 
—= —- 1 1 . 
f(z) =F —2z) P@/Drd —2/D ¢(—z) T/2) ; (13.66) 


where the final member of Eq. (13.66) was obtained by using the duplication formula, 
Eq. (13.27), with the value of z in the duplication formula set to the present —z/2. Equa- 
tion (13.66) can now be rearranged to the more symmetrical form 


1— : 
r(5)aPe@=r( 2 2) 89g = 2), U3c7) 


Equation (13.67), the zeta-function reflection formula, enables generation of ¢(z) in the 
half-plane Ste z < 0 from values in the region te z > 1, where the series definition con- 
verges. 

It is possible to show that ¢(z) has no zeros in the region where the series defi- 
nition converges, and, from Eq. (13.67), this implies that ¢(z) is also nonzero for all 
z in the half-plane te z < 0 except at points where I(z/2) is singular, namely z = 
—2,-4,...,—2n,.... T(z/2) is also singular at z = 0 but, as we shall see shortly, the 
singularity at ¢(1) compensates the singularity at (0), with the result that ¢ (0) is nonzero. 

The zeros of ¢(z) at the negative even integers are called its trivial zeros, as they arise 
from the singularities of the gamma function. Any other zeros of ¢(z) (and there are an 
infinite number of them) must lie in the region 0 < Ne z < 1, which has been called the 
critical strip of the Riemann zeta function. 

To obtain values of ¢(z) in the critical strip, we proceed by analytically continuing 
toward te z = 0 the formula from Eq. (12.62) that defines the Dirichlet series n(z) (clearly 
valid for ite z > 1), 

n(z) 8 Go Oat 
(=o =e . 


n=1 





(13.68) 


ne 


This alternating series converges for all te z > 0, thereby providing a formula for ¢(z) 
throughout the critical strip, but it is best used where the convergence is relatively rapid, 
namely for Rte z > 5. Values of ¢(z) for ite z < 5 may be more conveniently obtained 
from those for Ne z > 5 using the reflection formula, Eq. (13.67). 
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Equation (13.68) may be used to verify that the singularity of ¢(z) at z= 1 is a simple 
pole and to find its residue. We proceed as follows: 





z—-1 oo (-1)""! 
Residue at z= 1) = lim(z — l)e(z) = li 
(Residue at z = 1) te )§(z) tin (7x) n 


= (<3) (in2) = 1, (13.69) 


where we used l’H6pital’s rule, recognized that d2!~*/dz = —2!~* In2, and identified the 
summation as that of Eq. (1.53). Returning now to Eq. (13.67), noting that 
¢(l—z)  —residueoff(s)ats=1 1 
z>0 P(z/2)  2(residue of [(s) ats =0) 2” 





we obtain the nonzero result 


-1/2 1 1 
¢(0) =1d/2)x -~)=--. (13.70) 
2 2 
In addition to the practical utility we have already noted for the Riemann zeta function, 
it plays a major role in current developments in analytic number theory. A starting point 
for such investigations is the celebrated Euler prime number product formula, which can 
be developed by forming 





_ i. t 1 1 1 
02 OSI a (s+ as +), (13.71) 


eliminating all the n~*, where n is a multiple of 2. Then we write 





1 
— pose 


1 
579 


1 1 
stata 


c(s\(1—2°)1—3%)=14+ 3 





eee ee 
38 Qs 155 2 


eliminating all the remaining terms in which n is a multiple of 3. Continuing, we have 
f(s) —27-°)0 —3-*)(1 —5~*)--- (1 — P~*), where P is a prime number, and all terms 
n~*, in which n is a multiple of any integer up through P, are canceled out. In the limit 
P — o, we reach 





e(sj1=2 “l= 3") — PS 6) I] =P" y=, 
P(prime)=2 
Therefore 
cs)= [J a-Pry, (13.72) 


P(prime)=2 
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giving ¢(s) as an infinite product.’ Incidentally, the cancellation procedure in the above 
derivation has a clear application in numerical computation. For example, Eq. (13.71) will 
give €(s)(1 — 2~*) to the same accuracy as Eq. (13.61) gives ¢(s), but with only half as 
many terms. 

The asymptotic distribution of prime numbers can be related to the poles of ¢'/¢, and 
in particular to the nontrivial zeros of the zeta function. Riemann conjectured that all the 
nontrivial zeros were on the critical line te z = 5 and there are potentially important 
results that can be proved if Riemann’s conjecture is correct. Numerical work has verified 
that the first 300 x 10° nontrivial zeros of ¢(z) are simple and indeed fall on the critical 
line. See J. Van de Lune, H. J. J. Te Riele, and D. T. Winter, “On the zeros of the Riemann 
zeta function in the critical strip. IV,” Math. Comput. 47, 667 (1986). 

Although many gifted mathematicians have attempted to establish what has come to 
be known as the Riemann hypothesis, it has for about 150 years remained unproven 
and is considered one of the premier unsolved problems in modern mathematics. Pop- 
ular accounts of this fascinating problem can be found in M. du Santoy, The Music of 
the Primes: Searching to Solve the Greatest Mystery in Mathematics, New York: Harper- 
Collins (2003); J. Derbyshire, Prime Obsession: Bernhard Riemann and the Greatest Un- 
solved Problem in Mathematics, Washington, DC: Joseph Henry Press (2003); and K. Sab- 
bagh, The Riemann Hypothesis: The Greatest Unsolved Problem in Mathematics, New 
York: Farrar, Straus and Giroux (2003). 


Exercises 
13.5.1 Show that the symmetrical functional relation 
1- 
r(S)2-*?¢@ =r (=) 1-264 — 2) 
2 2 
follows from the equation 
Imig _ = _ (27)* 3riz/2 _ ae) 
(er -1)e@=cd Oey (c eric?) | 
13.5.2. Prove that 
e n xd 
x"erdx 
0 


Assuming n to be real, show that each side of the equation diverges if nm = 1. Hence 
the preceding equation carries the condition n > 1. Integrals such as this appear in the 
quantum theory of transport effects: thermal and electrical conductivity. 


4For further discussion, the reader is referred to the works by Edwards, Ivic, Patterson, and Titchmarsh in Additional Readings. 
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13.5.3 


13.5.4 


13.5.5 


13.5.6 


13.5.7 


The Bloch-Griineisen approximation for the resistance in a monovalent metal at abso- 
lute temperature T is 


@/T 

=cr f x dx 

P=" e6 J (1d —e-*)’ 
0 


where © is the Debye temperature characteristic of the metal. 





(a) For T > ov, show that 


ALA 
| 


(b) For T — 0, show that 
T> 
pX5!E(5)C oe" 
Derive the following expansion of the Debye function for n > 1: 


x 
t"dt nil x as Box x2* 
= ee Qn. 
ls Ns axa 22 Gann peer 
0 k=1 





The complete integral (0, 00) equals n!¢(n + 1) (Exercise 13.5.6). 
The total energy radiated by a blackbody is given by 


CO 
_ 8xk*T4 | x3 


oe | wal 


u 





0 


Show that the integral in this expression is equal to 3! ¢(4). The final result is the Stefan- 
Boltzmann law. 


As a generalization of the result in Exercise 13.5.5, show that 


x*dx 


ex—1 





=s!lC(s+1), Me(s)>0. 


OS 2 


Prove that 


x*dx 


qd 2-)c(s +1), Re(s) >0. 
2 








Exercises 13.5.6 and 13.5.7 give the Mellin integral transform of 1/(e* + 1); this trans- 
form is defined in Eq. (20.9). 





13.5.8 


13.5.9 
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The neutrino energy density (Fermi distribution) in the early history of the universe is 
given by 
[o,@) 
_ An | x? d 
Pu = 73 J exp(x/kT)+1°~ 
0 
Show that 
_ Tn? ary 
se 
Prove that 
[o,@) 
thet 
vw” (z) = (=1y"" ; {a= dt, MNe(z)>0. 
-—e 
0 


13.5.10 Show that ¢(s) is analytic in the entire finite complex plane except at s = 1, where it 


13.6 


has a simple pole with a residue of +1. 


Hint. The contour integral representation will be useful. 


OTHER RELATED FUNCTIONS 


Incomplete Gamma Functions 


Generalizing the Euler-integral definition of the gamma function, Eq. (13.5), we define 
incomplete gamma functions by the variable-limit integrals 


x 


yaa) feu 'at, Ka) > 0, 


0 
(13.73) 
CO 
I'(a, x) = feat. 
x 
Clearly, these two functions are related, for 
y(a,x)+T (a,x) =T(a). (13.74) 


The choice of employing y(a,x) or I'(a,x) is purely a matter of convenience. If the 
parameter a is a positive integer, Eqs. (13.73) may be integrated completely to yield 


n-l os 
y(n,x)=(n—1)! (: —e* ~ <) 


s=0 
(13.75) 


xs 


n-1 
T(n,x)=(n—1)!e* 


s=0 


st 
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While the above expressions are valid only for positive integer n, the function '(n, x) is 
well defined (providing x > 0) for n = 0 and corresponds to an exponential integral (see 
later subsection). 

For nonintegral a, a power-series expansion of y(a, x) for small x and an asymptotic 
expansion of I'(a, x) are developed in Exercises 1.3.3 and 13.6.4: 


aes n 
x 

y(a,x)=x") aan small x, 

n=0 

Co 

ai I(a) 1 
P(a,x)~x*1e* ) ——_. — 13.76 
ee “~ P@-7) x” ( ) 


tes 1 

nw xt le xG —n),—, large x, 
xn 

n=0 


where (a — n),, 1s a Pochhammer symbol. The final expression in Eq. (13.76) makes it 
clear how to obtain an asymptotic expansion for (0, x). Noting that (—n), = (—1)"n!, 





we have 
r@,x)~ 2 + yy (13.77) 
,x)~ -1ly"—. : 
# n=0 x” 


These incomplete gamma functions may also be expressed quite elegantly in terms of 
confluent hypergeometric functions (compare Section 18.6). 


Incomplete Beta Function 


Just as there are incomplete gamma functions, there is also an incomplete beta function, 
customarily defined for 0 < x <1, p > 0 (and, if x = 1, also g > 0) as 


By(p,q) = / P71 — 19 at. (13.78) 
0 


Clearly, B,— (p,q) becomes the regular (complete) beta function, Eq. (13.49). A power- 
series expansion of B,(p,q) is the subject of Exercise 13.6.5. The relation to hypergeo- 
metric functions appears in Section 18.5. 

The incomplete beta function makes an appearance in probability theory in calculating 
the probability of at most k successes in n independent trials.” 


Exponential Integral 


Although the incomplete gamma function I'(a, x) in its general form, Eq. (13.73), is only 
infrequently encountered in physical problems, a special case is quite common and very 


5W. Feller, An Introduction to Probability Theory and Its Applications, 3rd ed. New York: Wiley (1968), Section VI.10. 
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1 1 1 1 1 1 1 
02 04 06 08 1.0 12 14 1.46 











FiGURE 13.5 The exponential integral, FE; (x) = —Ei(— x). 


useful. We define the exponential integral by® 


CO 


=f 
Ei( nafs dt = E(x). (13.79) 


x 





For a graph of this function, see Fig. 13.5. To obtain a series expansion of Fj(x) for 
small x, we will need to proceed with caution, because the integral in Eq. (13.78) diverges 
logarithmically as x > 0. We start from 


E\(x) =T(0,x) = lim | F(a) - y(a,x)). (13.80) 


Setting a = 0 in the convergent terms (those with n > 1) in the expansion of y (a, x) and 
moving them outside the scope of the limiting process, we rearrange Eq. (13.80) to 


CO 


B(x) = tim | “O= =) - ae (13.81) 





n-n! 
n=1 


Using l’H6pital’s rule, Eq. (1.58), writing aI (a) = T'(a + 1), and noting that dx°/da = 
x“ Inx, the limit in Eq. (13.81) reduces to 


d d 
eB rat)-—x Tes —T(1)w(1) —Inx =—y —Inx, (13.82) 


where y (without arguments) is the Euler-Mascheroni constant.’ From Eqs. (13.81) and 
(13.82) we obtain the rapidly converging series 


[e.e) 


E\(x)=-—y —Inx > 


n=1 


(—1)"x" 


n-n! 





(13.83) 





The appearance of the two minus signs in —Ei(—.) is a historical monstrosity. AMS-55, chapter 5, denotes this integral as 
E, (x). See Additional Readings for the reference. 

7Having the notations y(a, x) and y in the same discussion and with different meanings may seem unfortunate, but these are 
the traditional notations and should not lead to confusion if the reader is alert. 
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FIGURE 13.6 Sine and cosine integrals. 


The asymptotic expansion for E(x) is simply that given in Eq. (13.77) for [(0, x). We 
repeat it here: 





fi ou 2. 3! 
E\(x)~e al 4: +], (13.84) 


x x2 x3 x4 
Further special forms related to the exponential integral are the sine integral, cosine 
integral (for both see Fig. 13.6), and the logarithmic integral, defined by® 


(oe) 
; sin t 
si(x) = — ae dt, 
x 
Co 
. cost 
Ci(x) = — a dt, (13.85) 


x 
x 
. dt ; 
liz) = | — = Ei(Inx). 
Int 
0 


Viewed as functions of a complex variable, Ci(z) and li(z) are multivalued, with a branch 
cut conventionally chosen to be along the negative real axis from the branch point at z = 0. 
By transforming from real to imaginary argument, we can show that 


si(x) = | Bien = Ei(-ix)| = x | Buin) = E\(-ix)| (13.86) 
whereas 
Ci(x) = 5 [ Eidix) + Ei(-ix)| = —5[ Ev) + E\(-ix)| , farexl< =. (13.87) 
2 2 2 
Adding these two relations, we obtain 


Ei(ix) = Ci(x) + isi(x), (13.88) 


8 Another sine integral is denoted Si(x) = si(x) + 7/2. 
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showing that the relation among these integrals is exactly analogous to that among e!*, 
cosx, and sinx. In terms of £1, 


E\ (ix) = —Ci(x) +i si(x). (13.89) 


Asymptotic expansions of Ci(x) and si(x) were developed in Section 12.6, with explicit 
formulas in Eqs. (12.93) and (12.94). Power-series expansions about the origin for Ci(x), 
si(x), and li(x) may be obtained from those for the exponential integral, E\ (x), or by direct 
integration, Exercise 13.6.13. The exponential, sine, and cosine integrals are tabulated in 
AMS-S55, chapter 5 (see Additional Readings for the reference), and can also be accessed 
by symbolic software such as Mathematica, Maple, Mathcad, and Reduce. 


Error Function 


The error function erf(z) and the complementary error function erfc(z) are defined by 
the integrals 


Zz [o.e) 
2 2 2 
efe= Ze fe at, erfo z= 1—erf= ef e! dt. (13.90) 
0 z 
The factors 2/,/7 cause these functions to be scaled so that erf oo = 1. For a plot of erf x, 
see Fig. 13.7. 


The power-series expansion of erf x follows directly from the expansion of the expo- 
nential in the integrand: 








2 oo (-1)" x2ntl 
fx= : 13.91 
ae eee (Qn + In! a 
Its asymptotic expansion, the subject of Exercise 12.6.3, is 
2 
e* 1 1-3 1-3-5 (2n — 1)! 
erfx + 1 1 z+ 74 3,6 aed (=) 2 (13.92) 
J X 2x 2°x 2°Xx 2"x 

















FiGURE 13.7 Error function, erf x. 
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From the general form of the integrands and Eq. (13.6) we expect that erf z and erfc z may 
be written as incomplete gamma functions with a = 5. The relations are 


Exercises 


13.6.1 


13.6.2 


13.6.3 


13.6.4 


13.6.5 





erfz=a'/?y(5, 2°), erfe z= a /?P(5, 2”). (13.93) 
(oe) 
= (a—1)! 
Show that y(a,x) =e * xan 
Wee) 2 arm! 


(a) by repeatedly integrating by parts, 
(b) by transforming it into Eq. (13.76). 





Show that 
a” —a m,.—a—m 

(a) am * y(a,x)] = (-D"x y(a+m,x), 
Xx 
d™ - ¢ To _ 

(b) aan y(a,x)]=e Tam)" m, x). 


Show that y (a, x) and I'(a, x) satisfy the recurrence relations 


x 


(a) y(a+1,x)=ay(a,x)—x%e™, 
(b) TV(@t+1,x)=al(a,x)+x%e™. 


Show that the asymptotic expansion (for large x) of the incomplete gamma function 
I'(a, x) has the form 


=. Fa 1 
T(a,x)~x*1e »X a a ra on 
and that the above expression is equivalent to 
~ 1 
T(a,x)~ x2 !e* Ya —N)pn aa 
n=0 


A series expansion of the incomplete beta function yields 


ae COC @) 8 4.3 
p+l 2!(p + 2) 
gfe a bn ee), 
n\(p +n) 





1 
By(p,q) =x? + 
P 








13.6.6 


13.6.7 
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Given that0 <x <1, p>0,andq > 0, test this series for convergence. What happens 
atx = 1? 


Using the definitions of the various functions, show that 
(a) si(x) = x[E1 (ix) — Ex(-ix)], 


(b)  Ci(x) = —5[ Ei (ix) + £1 (—ix)], 
(c) Ey(ix) = —Ci(x) +7 si(x). 


The potential produced by a 1s hydrogen electron is given by 


q 


V = 
©) An Eodo 





E y3,2r)+TQ, 2 : 
2r 


(a) Forr <1, show that 





vir) is igre]. 


= Art €oao 3 
(b) Forr > 1, show that 


q — 


VynN= --, 
) Az €9ao r 





Here r is expressed in units of ag, the Bohr radius. 


Note. V(r) is illustrated in Fig. 13.8. 





Point charge potential 
1/r 


Distributed 


charge 
potential 








FIGURE 13.8 Distributed charge potential produced by a 1s hydrogen electron, 
Exercise 13.6.7. 
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13.6.8 


13.6.9 


13.6.10 


13.6.11 


13.6.12 


13.6.13 


The potential produced by a 2p hydrogen electron can be shown to be 











qd 1 
V = . -y(5, T(4, 
aa sea [on ars n| 
Eiul! 2) he) hy Oy | Bose) 
dey (dg lee | 


Here r is expressed in units of ao, the Bohr radius. P2(cos@) is a Legendre polynomial 
(Section 15.1). 


(a) Forr < 1, show that 





Vie Se apt 
n= . Cos cea ry 
Arey) anl4 120 7 


(b) Forr > 1, show that 


ot 
Ait e9 aor 





V@wM= E 3 Pa(c0s6) + 
r 


Prove that the exponential integral has the expansion 








=f oo nyn 
e (—1)"x 
dt= In. 
/ t al 2 nent” 
x n=1 
where y is the Euler-Mascheroni constant. 
Show that E;(z) may be written as 
foe) 
e* 
E =Eer* dt. 
1(z) =e / baa 
0 


Show also that we must impose the condition | arg z| < 2/2. 


Related to the exponential integral by a simple change of variable is the function 


ee) 


ew 
En(x) =) dt. 


t? 
1 





Show that E,,(x) satisfies the recurrence relation 
1, x 
En4i(x) = —e —-E,(x), n=1,2,3,---. 
n n 


With E, (x) as defined in Exercise 13.6.11, show that for n > 1, 
E, (0) = 1/(n — 1). 


Develop the following power-series expansions: 





13.6 Other Related Functions 641 





oy Ee Etat 
(a) si(x)= 2+. Gre Dent DN 

| 7 ee) (—1)"x2" 
(b) GO = Vet Gene 


13.6.14 An analysis of a center-fed linear antenna leads to the expression 


x 


/ 1— cost 
——— dt 
t 


0 
Show that this is equal to y + Inx — Ci(x). 


13.6.15 Using the relation 
I(a)=y(a,x)+PG,x), 


show that if y (a, x) satisfies the relations of Exercise 13.6.2, then (a, x) must satisfy 
the same relations. 


13.6.16 For x > 0, show that 








Co 

t"dt as 42 x" nx"! n(n — 1)x"-? n! 
ifa=— >. a mt @ foot ar 
x = 
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14.1 


CHAPTER 14 


BESSEL FUNCTIONS 


Bessel functions appear in a wide variety of physical problems. In Section 9.4 we saw 
that separation of the Helmholtz, or wave, equation in circular cylindrical coordinates led 
to Bessel’s equation in the coordinate describing distance from the axis of the cylindrical 
system. In that same section, we also identified spherical Bessel functions (closely related 
to Bessel functions of half-integral order) in Helmholtz equations in spherical coordinates. 
In summarizing the forms of solutions to partial differential equations (PDEs) in these 
coordinate systems, we not only identified the original and spherical Bessel functions, 
but also those of imaginary argument (usually expressed as modified Bessel functions to 
avoid the explicit use of imaginary quantities). Since these PDEs can describe many types 
of problems ranging from stationary problems in quantum mechanics to those of spherical 
or cylindrical wave propagation, a good familiarity with Bessel functions is important to 
the practicing physicist. 

Often problems in physics involve integrals that can be identified as Bessel functions, 
even when the original problem did not explicitly involve cylindrical or spherical geom- 
etry. Moreover, Bessel and closely related functions form a rich area of mathematical 
analysis with many representations, many interesting and useful properties, and many 
interrelations. Some of the major interrelations are developed in the present chapter. 
In addition to the material presented here, we call attention to further relations in terms 
of confluent hypergeometric functions; see Section 18.6. 


BESSEL FUNCTIONS OF THE FIRST KIND, J,,(x) 


Bessel functions of the first kind, normally labeled J,,, are those obtained by the Frobenius 
method for solution of the Bessel ODE, 


eI tx dS + (x? —v?) Jy =0. (14.1) 


The term “first kind” reflects the fact that J,(x) includes the functions that, for non- 
negative integer v, are regular at x = 0. All solutions to the Bessel ordinary differential 
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equation (ODE) that are linearly independent of J,(x) are irregular at x = 0 for all v; 
a specific choice for a second solution is denoted Y,,(x) and is called a Bessel function of 
the second kind.! 


Generating Function for Integral Order 


We start our detailed study of Bessel functions by introducing a generating function yield- 
ing the J, for integer n (of either sign). Because the J, are not polynomials, the generating 
function cannot be found by the methods of Section 12.1, but we will be able to show that 
the functions defined by the generating function are indeed the solutions of the Bessel ODE 
obtained by the Frobenius method. 

Our generating function formula, a Laurent series, is 


CO 
g(x, t) = e@/IE-N/) — > Jn (x)t”. (14.2) 


n=—Oo 


Although the Bessel ODE is homogeneous and its solutions are of arbitrary scale, 
Eq. (14.2) fixes a specific scale for J, (x). To relate Eq. (14.2) to the Frobenius solution, 
Eq. (7.48), we manipulate the exponential as follows: 


lore) 2.gr 100 or 
a en (= rt (2 St 
g(x, th =e? .¢ > 5) a 1) 5) 5 


r=0 





r+s 7-8 


= raps aa (3) ris! 


r=0 s=0 





We now change the summation index r ton =r — s, yielding 


= (-1) n+2s | 
ga.n= >>  e@® Ir. (14.3) 


Ss 


where the s summation starts at max(0, —n). For n > 0, the coefficient of t” is seen to be 


CO 
(-1)§ x\n+2s 
i,@= > (5) 14.4 

ae do ta 2 ve 
Comparing with Eq. (7.48), we confirm that for n > 0, J, as given by Eq. (14.4) is the 
Frobenius solution, at the specific scale given here. 

If now we replace n by —n, the summation in Eq. (14.3) becomes 
CO 


(-1)° xX\—n+2s 
Jn) =) om >) 


s=n 


We use the notation of AMS-55, also used by Watson in his definitive treatise (for both sources, see Additional Readings). The 
Y, are sometimes also called Neumann functions; for that reason some workers write them as N,. They were denoted N, in 
previous editions of this book. 
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changing s to s +n, we reach 


oo (—1)°+" x \nt2s 7 ; 
J_n(x) = 2 slotnl (5) =(-1)"Jn(x) (integral n), (14.5) 


confirming both that J_, (x) is a solution to the Bessel ODE and that it is linearly dependent 
on Jy. 

If we now consider J,, with v nonintegral, we get no information from the generating 
function, but the Frobenius method then gives linearly independent solutions for both +v 
and —v, which are both solutions of the Bessel ODE, Eq. (14.1), for the same value of v. 
Looking at the details of the development of Eqs. (7.46) to (7.48), we see that the generali- 
zation of Eq. (14.4) to noninteger v is 


[o,@) 


7: (-1)§ X\ v+2s 
i=) aRw4sFD (2) . ee (14.6) 


and that J, (x) as given in Eq. (14.6) is a solution to the Bessel ODE. 

For v > 0 the series of Eq. (14.6) is convergent for all x, and for small x is a practi- 
cal way to evaluate J, (x). Graphs of Jo, Jj, and J2 are shown in Fig. 14.1. The Bessel 
functions oscillate but are not periodic, except in the limit x — oo, with the amplitude of 
the oscillation decreasing asymptotically as x~!/. This behavior is discussed further in 
Section 14.6. 





Recurrence Relations 


The Bessel functions J,(x) satisfy recurrence relations connecting functions of contigu- 
ous n, as well as some connecting the derivative J/ to various J,. Such recurrence rela- 
tions may all be obtained by operating on the series, Eq. (14.6), although this requires a bit 
of clairvoyance (or a lot of trial and error). However, if the recurrence relations are already 
known, their verification is straightforward; see Exercise 14.1.8. Our approach here will 
be to obtain them from the generating function g(x, f), using a process similar to that 
illustrated in Example 12.1.2. 











FIGURE 14.1 Bessel functions Jo(x), J) (x), and J2(x). 
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We start by differentiating g(x, ft): 


a 1 ~ 
8D = 5 (1 + =) ge = SIE 
t t ae 
y l 1) ee) Soe 
qe = = he = Je 
n=—OCoO 


Inserting the right-hand side of Eq. (14.2) in place of the exponentials and equating the 
coefficients of equal powers of tf (as illustrated in Example 12.1.2), we obtain the two 
basic Bessel-function recurrence formulas: 


2n 
Jn—-1(*) + Ing) = rR (x), (14.7) 





Jn—-1(X) — Friar (e (14.8) 


Because Eq. (14.7) is a three-term recurrence relation, its use to generate J, will require 
two starting values. For example, given Jo and J), then J2 (and any other integral order J, 
including those for n < 0) may be computed. 

An important special case of Eq. (14.8) is 


Jp(x) = —Ji (x). (14.9) 


Equations (14.7) and (14.8) can also be combined (Exercise 14.1.4) to form the useful 
additional formulas 














d 
Tle in) =f" s,1(%), (14.10) 
d —n —n 
rr In (x) | =x n+i(x), (14.11) 
, n+1 
Jn(x) = £S,4, + Ine). (14.12) 


Bessel’s Differential Equation 


Suppose we consider a set of functions Z, (x) that satisfies the basic recurrence relations, 
Eqs. (14.7) and (14.8), but with v not necessarily an integer and Z,, not necessarily given by 
the series in Eq. (14.6). It is our objective to show that any functions that satisfy these recur- 
rence relations must also be solutions to Bessel’s ODE. We start by forming (1) x7Z,"(x) 
from x*/2 times the derivative of Eq. (14.8), (2) xZi,(x) from Eq. (14.8) multiplied by 
x/2, and (3) v*Z,(x) from Eq. (14.7) multiplied by vx /2. Putting these together we obtain 


eof O47 G) =" 2.0) 


x2 


; ; v—1l v+1l 
=> |Zi-100) — Ziy44 (8) — ——Zy-1(2) - Ze) (14.13) 
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The terms within square brackets in Eq. (14.13) can now by use of Eq. (14.12) be simplified 
to —2Z,,(x), so Eq. (14.13) can be rewritten 


x? Zy" (x) +xZ)(x) + (x? — v7)Zy(x) =0, (14.14) 


which is Bessel’s ODE. Reiterating, we have shown that any functions Z,,(x) that satisfy 
the basic recurrence formulas, Eqs. (14.7) and (14.8), also satisfy Bessel’s equation; that 
is, the Z,, are Bessel functions. For later use, we note that if the argument of Z,, is ko rather 
than x, Eq. (14.14) becomes 


2 


d d 
p? aaa Zulkp) + pa Zutko) + (Wp? — v°)Zulkp) = 0. (14.15) 


Integral Representation 


It is of great value to have integral representations of Bessel functions. Starting from the 
generating-function formula, we can apply the residue theorem to evaluate the contour 
integral 


et /2)(t+1/t) a4 
§ ee = § Se Sas, (14.16) 
C m 


where the contour C encircles the singularity at t = 0. The integral on the left-hand side 
of Eq. (14.16) can now be brought to a convenient form by taking the contour to be the 
unit circle and changing the integration variable by making the substitution t = e!®. Then 
dt = ie!° da, e@/)¢-1/) — eixsin® and we have 
jas 20 
. phe i0 i(x sin@—né) ; 
aris) = f rzie a= |e : idd. (14.17) 
0 0 


Assuming x to be real and taking the imaginary parts of both sides of Eq. (14.17), we find 


20 4 
1 1 
In (x) = mm / cos(x sin9 — n@)d0 = a / cos(x sind —n@) dé, (14.18) 
0 0 


where the last equality only holds because we are assuming n to be an integer. Though we 
will not need it now, the real part of this equation also gives an interesting formula: 


20 
[since sin@ — n0)do = 0. (14.19) 
0 
An oft-occurring special case of Eq. (14.18) is 
20 4 
Jo(x) = a i; el 089.19 — = f cos(x sin@) dé. (14.20) 
20 : IT ‘ 
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Table 14.1. Zeros of the Bessel Functions and Their First Derivatives 














Number 

of zeros Jo(x) J, (x) Jo(x) J3(x) J4(x) J5(x) 

1 2.4048 3.8317 5.1356 6.3802 7.5883 8.7715 

2 5.5201 7.0156 8.4172 9.7610 11.0647 12.3386 

3 8.6537 10.1735 11.6198 13.0152 14.3725 15.7002 

4 11.7915 13.3237 14.7960 16.2235 17.6160 18.9801 

> 14.9309 16.4706 17.9598 19.4094 20.8269 22.2178 
I(x) Ji (x) J5(x) I5(x) Jy (x) JE (x) 

1 3.8317 1.8412 3.0542 4.2012 5.3176 6.4156 

2 7.0156 5.3314 6.7061 8.0152 9.2824 10.5199 

3 10.1735 8.5363 9.9695 11.3459 12.6819 13.9872 

4 13.3237 11.7060 13.1704 14.5858 15.9641 17.3128 

3 16.4706 14.8636 16.3475 17.7887 19.1960 20.5755 





Equation (14.18) is only one of many integral representations of J;,, and some of these 
can be derived (using an appropriately modified contour) for J, of a nonintegral order. 
This topic is explored in the subsection below entitled “Bessel Functions of Nonintegral 
Order”. 


Zeros of Bessel Functions 


In many physical problems in which phenomena are described by Bessel functions, we are 
interested in the points where these functions (which have oscillatory character) are zero. 
For example, in a problem involving standing waves, these zeros identify the positions of 
the nodes. And in boundary value problems, we may need to choose the argument of our 
Bessel function to put a zero at an appropriate point. 

There are no closed formulas for the zeros of Bessel functions; they must be found by 
numerical methods. Because the need for them arises frequently, tables of the zeros are 
available, both in compilations such as AMS-55 (see Additional Readings) and at a variety 
of sources online.* Table 14.1 lists the first few zeros of J,(x) for integer n from n = 0 
through n = 5, giving also the positions of the zeros of J/. 


Example 14.1.1 FRAUNHOFER DIFFRACTION, CIRCULAR APERTURE 


In the theory of diffraction of radiation of wavelength A, incident normal to a circular 
aperture of radius a, we encounter the integral 


a 20 
o~ [rar f eras, (14.21) 
0 0 


2 Additional roots of the Bessel functions and those of their first derivatives may be found in C. L. Beattie, Table of first 700 zeros 
of Bessel functions, Bell Syst. Tech. J. 37, 689 (1958), and Bell Monogr. 3055. Roots may be also be accessed in Mathematica, 
Maple, and other symbolic software. 
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Tt, 











FiGURE 14.2. Geometry for Fraunhofer diffraction, circular aperture. 


where ® is the amplitude of the diffracted wave and (r, 9) identifies points in the aperture. 
The exponent br cos@ is the phase of the radiation through (r, @) that is diffracted to an 
angle a from the incident direction, with 


OT 3 
b= = sina. (14.22) 


The geometry is illustrated in Fig. 14.2. Fraunhofer diffraction, for which the above are 
the relevant formulas, applies in the limit that the outgoing radiation is detected at large 
distances from the aperture. 

The behavior of the complex exponential will cause the amplitude to oscillate as a is 
increased, creating (for each wavelength) a diffraction pattern. To understand the patterns 
more fully, we need to evaluate the integral in Eq. (14.21). From Eq. (14.20) we may 
immediately reduce Eq. (14.21) to 


a 
© ~ 20 / Jo(br)rdr, (14.23) 
0 
which can be integrated in r using Eq. (14.10): 
a 
ld 20 a 2a 
@~In | aT [ (br) Ji (br) | dr = pz Lordi (br)]4 = = M10), (14.24) 
0 


where we have used the fact that J;(0) = 0. The intensity of the light in the diffraction 
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0.0002 
Radians 








FiGurE 14.3. Amplitude of Fraunhofer diffraction vs. deflection angle (green light, 
aperture of radius 0.5 cm). 


pattern is proportional to ©? and, substituting for b from Eq. (14.22), 
ae (eras) ‘aly 


sina 


(14.25) 


For visible light and apertures of reasonable size, 277a/A is quite small: for green light 
(A =5.5 x 107> cm) and an aperture with a = 0.5 cm, 27a/A = 57120, and these parame- 
ter values lead to the pattern for ® shown in Fig. 14.3. Note that the figure plots ® (a plot 
of ©? would make the oscillations too small to be observable on the same graph as the 
maximum at w = 0). We see that ® exhibits a central maximum at a = 0 of amplitude 
~30,000, with subsidiary extrema that by a = 0.001 radian have decreased in magnitude 
to less than 1% of the central maximum. Remembering that the intensity is ®7, we see that 
the diffraction spreading of the incident light is exceedingly small. To make a quantitative 
analysis of the diffraction pattern, we need to identify the positions of its minima. They 
correspond to the zeros of J,; for example, from Table 14.1 we find the first minimum 
to be where (27a/i) sina = 3.8317, or a © 14 seconds of arc. If this analysis had been 
known in the 17th century, the arguments against the wave theory of light would have 
collapsed. 

In mid-20th century this same diffraction pattern appears in the scattering of nuclear 
particles by atomic nuclei, a striking demonstration of the wave properties of the nuclear 
particles. a 


Further examples of the use of Bessel functions and their roots are provided by the 
following example and by the exercises of this section and Section 14.2. 


Example 14.1.2 CYLINDRICAL RESONANT CAVITY 


The propagation of electromagnetic waves in hollow metallic cylinders is important in 
many practical devices. If the cylinder has end surfaces, it is called a cavity. Resonant 
cavities play a crucial role in many particle accelerators. 
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The resonant frequencies of a cavity are those of the oscillatory solutions to Maxwell’s 
equations that correspond to standing wave patterns. By combining Maxwell’s equations, 
we derived in Example 3.6.2 the vector Laplace equation for the electric field E in a region 
free of electric charges and currents. Taking the z-axis along the axis of the cavity, our 
concern here is the equation for E,, which from Eq. (3.71) we found to have the form 


1-078. 

c at?’ 
which has standing-wave solutions E(x, y,z,f) = Ez(x, y,z) f(t), where f(t) has real 
solutions sin wt and cos wt, corresponding to sinusoidal oscillations at angular frequency w. 
We are implicitly assuming that our solution has a nonzero component E, and we will also 
set B, = 0, so we intend to obtain solutions that are usually called the TM (for “‘transverse 
magnetic”) modes of oscillation. Additional solutions, with E, = 0 and B, nonzero, corre- 
spond to TE (transverse electric) modes and are the subject of Exercise 14.1.25. 

Thus, for the present problem, in which our cavity is that shown in Fig. 14.4, we seek 
solutions to the spatial PDE: 


V-E-= (14.26) 


2 2 @ 

VE, +k°E,=0, ha (14.27) 
The aim of the present example is to find the values of w for which Eq. (14.27) has solutions 
consistent with the boundary conditions at the cavity walls. Assuming the metallic walls 
to be perfect conductors, the boundary conditions are that the tangential components of the 
electric field vanish there. Taking the cavity to have planar end caps at z = 0 and z = h, and 
(in cylindrical coordinates o, y) to be bounded by a curved surface at p = a, our boundary 
conditions are E, = Ey = 0 on the end caps, and Ey = E, = 0 on the boundary at p =a. 




















FIGURE 14.4 Resonant cavity. 
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Once a solution (with B, = 0) has been found for E,, then the remaining components of 
B and E have definite values. For further details, see J. D. Jackson, Electrodynamics in 
Additional Readings. 

Equation (14.27) can be solved by the method of separation of variables, with solutions 
of the form given in Eq. (9.64): 


Ez(p, 9,2) = Pim(e) Pm (~)Zi(Z), (14.28) 


with ®,,(0) = e*'? or its equivalent in terms of sines and cosines, while Z;(z) and 
Pim(p) are solutions of the ODEs 








d*Z, 2 
nel es CY 2a 14.29 
Ie 1 ( ) 
d ( dP 
pts i) 4 (2 Pp?) P26. (14.30) 
do dp 


Equation (14.29) corresponds to Eq. (9.58), but with a different choice of the sign for 
the separation constant in anticipation of the fact that Z; will turn out to be oscillatory. 
This change causes n” in Eq. (9.60) to become k* — /?, and Eq. (14.30) is then seen to 
correspond exactly with Eq. (9.63). 

Recognizing now Eq. (14.30) as Bessel’s ODE and Eq. (14.29) as the ODE for a classical 
harmonic oscillator, we find, before imposing boundary conditions, 


Ez = Jm(npye~'"®[ Asinlz + B cosiz], (14.31) 





and the general solution will be an arbitrary linear combination of the above for different 
values of n, m, and 1. We have chosen the solution to Bessel’s ODE to be of the first kind 
to maintain regularity at o = 0, since this ¢ value is inside the cavity. We have written the 
gy dependence of the solution as a complex exponential for notational convenience. The 
physically relevant solutions will be arbitrary mixtures of the corresponding real quanti- 
ties, sinmg and cosm@g. Continuity and single-valuedness in ¢ dictate that m have integer 
values. 

The condition that E, = 0 on the curved boundary translates into the requirement 
Jm(na) = 0. Letting a»; stand for the jth positive zero of Jn, we find that 


\2 
na=0mj, or k—-P= (=“) (14.32) 
a 
To complete the solution we need to identify the boundary condition on Z. Because 
dE, /dx = 0Ey/dy =0 on the end caps, we have from the Maxwell equation for V - E: 


dE, OE, dE; dE, 
: -=0 -=0, 14.33 
ax a dy = Oz : Oz ( ) 


so we have the requirement Z’(0) = Z’(h) = 0, and we must choose 


. pr 
Z=Bcoslz, with bs p=0,1,2,.... (14.34) 





Combining Eqs. (14.32) and (14.34), we find 
2 


e=(2y + (BY =$. 1439 
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thereby providing an equation for the resonant frequencies: 








2 m=0,1,2,... 
0 ears 242 at sal | ’ 

Omjp =Cy a Pale Biocy: (14.36) 
p=0,1,2,.... 


Recapitulating, the functions we have found, labeled by the indices m, j, and p, are 
the spatial parts of standing-wave solutions of TM character whose time dependence and 
overall amplitude are of the form Ce*!@mir!, a 





Bessel Functions of Nonintegral Order 


While J, of noninteger v are not produced from a generating-function approach, they are 
readily identified from the Taylor series expansion, and they are conventionally given a 
scale consistent with that of the J, of integer n. They then satisfy the same recurrence 
relations as those derived from the generating function. 

If v is not an integer, there is actually an important simplification. The functions J, 
and J_, are then independent solutions of the same ODE, and a relation of the form of 
Eq. (14.5) does not exist. On the other hand, for v = n, an integer, we need another solution. 
The development of this second solution and an investigation of its properties form the 
subject of Section 14.3. 


Schlaefli Integral 


It is useful to modify the integral representation, Eq. (14.16), so that it can be applied for 
Bessel functions of nonintegral order. Our first step in doing so is to deform the circular 
contour by stretching it to infinity on the negative real axis and opening the contour there, 
as shown in Fig. 14.5. Our integral, written 





Lf e(t/2t-1/2) 
Rw=s f wa ats (14.37) 
c 














FiGURE 14.5 Contour, Schlaefli integral for J,. 
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now has a branch point at t = 0, and because we have opened the contour we can place the 
branch cut along the negative real axis. We might anticipate that this procedure will not 
affect our integral representation, as the integrand vanishes at f = —oo on both sides of the 
cut. However, that remains to be proved. 

Our first step toward a proof that F, is actually J, is to verify that F, still satisfies 
Bessel’s ODE. If we substitute F,, and its x derivatives into the ODE, we can, after some 
manipulation, reach the expression 


oe 6 andaale ns ee LY 14.38 
ae tv a t ; eS 


Cc 





and because the integration is within a region of analyticity of the integrand, the integral 
reduces to 


e(#/2)(t-1/1) y 1 o(x/2)t-1/1) “ 1 
t t 
1 aes ae rc ak ae’ 
end 


We therefore conclude that the ODE is satisfied if the above expression vanishes; in our 
present situation each of the quantities in braces is zero for large negative ¢ and positive x, 
confirming that F\, satisfies Bessel’s ODE. 

We still need to show that F, is the solution designated J,,; to accomplish this we con- 
sider its value for small x > 0. Deforming the contour to a large open circle and making a 
change of variable to u = e'” xt /2, we get (to lowest order in x) 


1 sxy\y . e" 
F(x) © = (5) ve f au. (14.39) 
Cc’ 








start 


Because of the change of variable, the contour C’ becomes that which we introduced 
when developing a Schlaefli integral representation of the gamma function, and, using 
Eq. (13.31), we reduce Eq. (14.39) to 


i )a\r(— 1 
Fy(x) © (5) a (5). (14.40) 
2 T Tow+t+l) 
where the last step used the reflection formula for the gamma function, Eq. (13.23). Since 
this is the leading term of the expansion for J,,, our proof is complete. 





Exercises 
14.1.1 From the product of the generating functions g(x, t)g(x, —t), show that 
1=[Jo@P +212? + 2Lb@P ++ 
and therefore that | Jo(x)| < 1 and |J,(x)| < 1/V2, | al ee ee eee 
Hint. Use uniqueness of power series, (Section 1.2). 
14.1.2 Using a generating function g(x, t) = g(u+u,t) = g(u,t)g(v, ft), show that 


(a) Jn(utvu)= penieee Js(u)In—s(v), 
(b)  Jo(u+v) = Jo(u)Jo(v) +2092; Js(u) J-s(v). 


These are addition theorems for the Bessel functions. 
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14.1.3. Using only the generating function 


Cc 
et/2)(t-1/t) a In (x)t" 


n=—CO 


and not the explicit series form of J, (x), show that J, (x) has odd or even parity accord- 
ing to whether n is odd or even, that is, 


Jn(x) = (-1)" In (—x). 
14.1.4 Use the basic recurrence formulas, Eqs. (14.7) and (14.8), to prove the following 
formulas: 
(a) 2 lx" In (x) = x" In—10), 
(b) LE fr" Jn (0) = "Ing 0), 
(Cc) In(x) = J + Engi). 


14.1.5 Derive the Jacobi-Anger expansion 


[ee] 


eiPCOsp _ Si i” Im(pye"?. 


m=—CO 
This is an expansion of a plane wave in a series of cylindrical waves. 


14.1.6 Show that 


(a) cosx = Jo(x) +272 (-1)" Jon), 
(b)  sinx = 25° 9(—1)" Jong (x). 
14.1.7. To help remove the generating function from the realm of magic, show that it can be 
derived from the recurrence relation, Eq. (14.7). 
Hint. (a) Assume a generating function of the form 


CO 


g(x, t= >. JIn(x)t. 


m=—OCo 
(b) Multiply Eq. (14.7) by t” and sum over n. 
(c) Rewrite the preceding result as 
1 2t Og(x,t) 
t+- ,t) = ——-——.. 
( 7 t ) g(2.0) x dt 


(d) Integrate and adjust the “constant” of integration (a function of x) 
so that the coefficient of the zeroth power, t°, is Jo(x) as given by 
Eq. (14.6). 
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14.1.8 Show, by direct differentiation, that 


[o.@) 


(-1)§ X\ v+2s 
Ja) =) sTGo+v+) ) 


s=0 





satisfies the two recurrence relations 


2v 
Jv-10¢) + Jai) = SZ h@), 





Jy—1(*) — Jy41%) = 255%), 
and Bessel’s differential equation 
x7 (x) +25) (x) +? — vA) =0. 


14.1.9 Prove that 


x/2 m/2 
1—cosx 
= Ji(xcos@) dé. 


0 0 


— = / Jo(x cos@) cosé dé, 


Xx 


Hint. The definite integral 


m/2 


13255-0541) 





0 
may be useful. 


14.1.10 Derive 
1d\" 
L=(-—)"" (<=) Jo(x). 
x dx 


Hint. Try mathematical induction (Section 1.4). 


14.1.11 Show that between any two consecutive zeros of J;,(x) there is one and only one zero 
of Jn+1(x). 


Hint. Equations (14.10) and (14.11) may be useful. 


14.1.12 An analysis of antenna radiation patterns for a system with a circular aperture involves 
the equation 


1 


eu) = ff) Jo(uryrar. 


0 


If f(r) = 1 —r?, show that 


2 
gu) = 5 blu). 
Uu 





14.1.13 


14.1.14 


14.1.15 


14.1.16 
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The differential cross section in a nuclear scattering experiment is given by do /dQ = 
| f(@)|*. An approximate treatment leads to 


. 2a R 
f@= = / i; exp[ikp sin@ sing]p do dg. 
0 0 
Here @ is an angle through which the scattered particle is scattered. R is the nuclear 
radius. Show that 


= nR)—| 
1 


dQ 


Ji(kR sin@) |? 
sind , 


A set of functions C,,(x) satisfies the recurrence relations 


2n 
Cn—1(%) — Croix) = = On @), 





Cn—1 (x) + Cri) = 2C}, (x). 


(a) What linear second-order ODE does the C,,(x) satisfy? 


(b) By a change of variable transform your ODE into Bessel’s equation. This sug- 
gests that C,(x) may be expressed in terms of Bessel functions of transformed 
argument. 


(a) Show by direct differentiation and substitution that 
Ji(x) = = / ee Olas 
21 
Cc 
(this is the Schlaefli integral representation of J,,), and that the equivalent equation, 


Jy (x) = = (=) / e145 sv lag 


2mi \2 
C 


both satisfy Bessel’s equation. C is the contour shown in Fig. 14.5. The negative 
real axis is the cut line. 


Hint. This exercise is aimed at providing details of the discussion that starts at 
Eq. (14.38). 
(b) Show that the first integral (with n an integer) may be transformed into 
1 20 -_ 21 
¥ ye i(x sind—n9) 7g — —| i(x cosO+n8) 79 
n(x) on fe = e 
0 0 


The contour C in Exercise 14.1.15 is deformed to the path —oo to —1, unit circle e~'” 


to e'”, and finally —1 to —oo. Show that 


us Co 
1 : ; 
Jy (x) = = / cos(vé — x sin@)dé — masa i: e7v9-xsinhd 7g, 
a a 
o 0 


This is Bessel’s integral. 
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Hint. The negative values of the variable of integration u must be represented in 


a manner consistent with the presence of the branch cut, for example, by writing 
u = tet, 


14.1.17 (a) Show that 





m/2 
xX\v 


2 : 2v 
5 cos(x sin@) cos~ 6 dé, 
0 


Jj(x) = mir io 1) ( 


where v > —3. 


Hint. Here is a chance to use series expansion and term-by-term integration. The 
formulas of Section 13.3 will prove useful. 


(b) Transform the integral in part (a) into 


4 


1 


J, (x) = ———_ (5)° [cost cos@) sin’” 6 dé 
, m'/2P(v + 4) \2 
0 





1 
= a (5) / erix cosé sin2” 0 do 
m/T(y + 5) \2 
0 





1 
1 x\v $i 
= ae er PX a Bye d : 
m/2r(y +4) (5) i 4 i 


These are alternate integral representations of J, (x). 


14.1.18 Given that C is the contour in Fig. 14.5, 


(a) From 


1 o¢x\v 2 
7 ny ~) poet et- 7 /4t ay 
vO= TF iG i c 


C 
derive the recurrence relation 
v 
Jy) = = Jo) — Ins). 
(b) From 


1 
Jy(x) = sey fea 


derive the recurrence relation 


1 


I(x) = 51 Jv-1) — Jvgi@)]- 





14.1.19 


14.1.20 


14.1.21 


14.1.22 


14.1.23 


14.1.24 
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Show that the recurrence relation 
; 1 
Jn) = 5 [ Jn1) — Jn 2)] 
follows directly from differentiation of 


1s 


1 
Jn(x) = a / cos(n@ — x sin@) dé. 


0 
Evaluate 
lo) 
jee Jo(bx)dx, a,b>0. 
0 


Actually the results hold for a > 0, —oo < b < oo. This is a Laplace transform of Jo. 
Hint. Either an integral representation of Jo or a series expansion will be helpful. 


Using the symmetries of the trigonometric functions, confirm that for integer n, 


1s 


20 
1 1 
5 | cose sind — n@)d@ = = | costs sind —né)dé. 
20 a 
0 


(a) Plot the intensity, ©? of Eq. (14.25), as a function of (sina/A) along a diameter 
of the circular diffraction pattern. Locate the first two minima. 


(b) Estimate the fraction of the total light intensity that falls within the central 
maximum. 


Hint. [J\(x)]?/x may be written as a derivative and the area integral of the intensity 
integrated by inspection. 


The fraction of light incident on a circular aperture (normal incidence) that is transmitted 
is given by 


2ka 2ka 
dx 1 
T=2 | Jo(x)—- — |] Jo(x)dx. 
x 2ka 
0 0 


Here a is the radius of the aperture and k is the wave number, 27/2. Show that 
2ka 


2ka 


ieee 1 
(a) Bat, Dee; (b) T=1—— ] Jo(x)dx. 
i= 0 


The amplitude U(¢, ¢, t) of a vibrating circular membrane of radius a satisfies the wave 
equation 


au | Lau 10°U  10°U 
dp2— op Op-~——sop2 ag2~—sv2-: At?" 


Here v is the phase velocity of the wave, determined by the properties of the membrane. 


WuU= 
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14.1.25 


14.1.26 


14.1.27 


14.1.28 


14.1.29 


(a) Show that a physically relevant solution is 

Up. 9.1) = Im(kp) (cre? + ene") (bye! + bai"), 
(b) From the Dirichlet boundary condition J, (ka) = 0, find the allowable values of k. 
Example 14.1.2 describes the TM modes of electromagnetic cavity oscillation. To 


obtain the transverse electric (TE) modes, we set E, = 0 and work from the z com- 
ponent of the magnetic induction B: 


VB, +a7B, =0 
with boundary conditions 
OB, 
ap 





Bz(0) = B,J) =0 and 





p=a 
Show that the TE resonant frequencies are given by 


2 2o2 
o =e Binn +4 pen 
mnp = oo 3 





p=1,2,3,..., 


and identify the quantities By. 


A conducting cylinder can accommodate traveling electromagnetic waves; when used 
for this purpose it is called a wave guide. The equations describing traveling waves are 
the same as those of Example 14.1.2, but there is no boundary condition on EF, at z =0 
or z =/h other than that its z dependence be oscillatory. For each TM mode (values 
of m and j of Example 14.1.2), there is a minimum frequency that can be transmitted 
through a wave guide of radius a. Explain why this is so, and give a formula for the 
cutoff frequencies. 


Plot the three lowest TM and the three lowest TE angular resonant frequencies, @mnp, 
as a function of the ratio radius/length (a/1) for0 <a/I < 1.5. 


Hint. Try plotting w? (in units of c*/a*) vs. (a/1)*. Why this choice? 
Show that the integral 


a 
[2 ncoas, m>=n=0, 
0 
(a) is integrable for m +n odd in terms of Bessel functions and powers of x, 1.e., is 
expressible as linear combinations of a? J, (a); 


(b) may be reduced for m + n even to integrated terms plus is Jo(x)dx. 


Show that 
0n 


On 
1 
/ (: — 2.) Jo(y)ydy = — / Jo(y)dy. 
4 Q0n 
0 


On 
0 
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Here a, is the nth zero of Jo(y). This relation is useful (see Exercise 14.2.9): The expres- 
sion on the right is easier and quicker to evaluate, and is much more accurate. Taking 
the difference of two terms in the expression on the left leads to a large relative error. 


14.2 ORTHOGONALITY 


To identify the orthogonality properties of Bessel functions, it is convenient to start by 
writing Bessel’s ODE in a form that we can recognize as a Sturm-Liouville eigenvalue 
problem, the general properties of which were discussed in detail starting from Eq. (8.15). 
If we divide Eq. (14.15) through by p* and rearrange slightly, we have 








i + ee Zy (kp) =k*Z,(kp) (14.41) 
dp? pap. pe vikKp) = v(Kp), . 
showing that Z, (kp) is an eigenfunction of the operator 
da 1d 
f= + » (14.42) 
dp? pdp_ pp? 


with eigenvalue k*. Since we are most often interested in problems whose solutions in 
cylindrical coordinates (p, g, z) separate into products P(o)®(g)Z(z) and which are for 
the region within a cylindrical boundary at some p = a, we usually have ®(g) = e!”"? with 
m an integer (thereby causing v* —> m7), and find that P(~) = Jn (ko). We choose P to 
be a Bessel function of the first kind because p = 0 is interior to our region and we want a 
solution that is nonsingular there. 

From Sturm-Liouville theory, we find that the weight factor needed to make £ of 
Eq. (14.42) self-adjoint (as an ODE) is w(e) = p, and the orthogonality integral for the 
two eigenfunctions J,(ko) and J,(k’p), a case of Eq. (8.20), is (whether or not v is an 
integer) 


al kJ, (ka) J} (ka) — ki (ka) J, (ka) | 
k2 — k/2 





a 

= / pJv(kp) Jv (k'p)dp. (14.43) 

0 
In writing Eq. (14.43) we have used the fact that the presence of a factor o in the boundary 
terms causes there to be no contribution from the lower limit p = 0.° 

Equation (14.43) shows us that the J, (k) of different k will be orthogonal (with weight 
factor o) if we can cause the left-hand side of that equation to vanish. We may do so by 
choosing k and k’ in such a way that J, (ka) = J,(k’a) = 0. In other words, we can require 
that k and k’ be such that ka and k’a are zeros of J,, and our Bessel functions will then 
satisfy Dirichlet boundary conditions. 

If now we let a,; denote the ith zero of J,,, the above analysis corresponds to the fol- 
lowing orthogonality formula for the interval [0, a]: 


a 


[ow (ai) Ii (2) dp=0, i#j. (14.44) 


0 


3 This will be true for all v > —1, as will become more evident when we discuss Bessel functions of the second kind. 
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FiGURE 14.6 Bessel functions J} (@1,0),n =1,2,3 onrange 0 <p <1. 


Note that all members of our orthogonal set of Bessel functions have the same value of 
the index v, differing only in the scale of the argument of J,. Successive members of the 
orthogonal set will have increasing numbers of oscillations in the interval (0, a). Note also 
that the weight factor, o, is just that which corresponds to unweighted orthogonality over 
the region within a circle of radius a. We show in Fig. 14.6 the first three Bessel functions 
of order v = 1 that are orthogonal within the unit circle. 

An alternative to the foregoing analysis would be to ensure the vanishing of the bound- 
ary term of Eq. (14.43) at o =a by choosing values of k corresponding to the Neumann 
boundary condition J/(ka) = 0. The functions obtained in this way would also form an 
orthogonal set. 


Normalization 


Our orthogonal sets of Bessel functions are not normalized, and to use them in expansions 
we need their normalization integrals. These integrals may be developed by returning to 
Eq. (14.43), which is valid for all k and k’, whether or not the boundary terms vanish. We 
take the limits of both sides of that equation as k’ > k, evaluating the limit on the left-hand 
side using l|’H6pital’s rule, which here corresponds to taking the derivatives of numerator 
and denominator with respect to k’: 


d d 
a a| J, (ka) —( k' J'(k'a)) — kJ’ (ka) —( J, (ka) 
Lg [eter awa) tarda (40) 
PJ (ke) lb do = lim 
k’—>k d 9 2 
0 ak =e) 
dk 
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We now simplify this equation for the case that ka = a,;, so we set J, (ka) = 0 and reach 


a 


fol (ou)f ap PO a)? sas 
0 





Now, because a,j; is a zero of Jy, Eq. (14.12) permits us to recognize that J/(ayj) = 
—Jy41(ayi). We then obtain from Eq. (14.45) the desired result, 


; 2 


: [av (aie) do = Ff Josrlonsd]?. (14.46) 


0 


Bessel Series 


If we assume that the set of Bessel functions J, (a,j; /a) for fixed v and for j = 1, 2,3,... 
is complete, then any well-behaved but otherwise arbitrary function f(o) may be expanded 
in a Bessel series 


[o.@) 


f(p) = Xi onit (en °), O<p<a, v>-l. (14.47) 


The coefficients c); are determined by the usual rules for orthogonal expansions. With the 
aid of Eq. (14.46) we have 


Cy = caf f(p)dv( yj — ) pdp. (14.48) 


As pointed out earlier, it is also possible to obtain an orthogonal set of Bessel functions 
of given order v by imposing the Neumann boundary condition J/ (kp) = 0 at p =a, 
corresponding to k = B,;/a, where f,; is the jth zero of J/. These functions can also be 
used for orthogonal expansions. This approach is explored in Exercises 14.2.2 and 14.2.5. 

The following example illustrates the usefulness of Bessel series. 


Example 14.2.1 ELECTROSTATIC POTENTIAL IN A HOLLOW CYLINDER 


We consider a hollow cylinder, which in cylindrical coordinates (p, g, z) is bounded by 
a curved surface at o = a and end caps at z = 0 and z =h. The base (z = 0) and curved 
surface are assumed to be grounded, and therefore at potential y = 0, while the end cap 
at z=h has a known potential distribution V(o, y, 4). Our problem is to determine the 
potential V(p, ¢, z) throughout the interior of the cylinder. 

We proceed by finding separated-variable solutions to the Laplace equation in cylindri- 
cal coordinates, along the lines discussed in Section 9.4. Our first step is to identify product 
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solutions, which, as in Eq. (9.64), must take the form* 





Wim (0, 9, 2) = Pim(p) Pm ()Z1(z), (14.49) 
with ®,, =e", and 

a 
—Z(z) =1°Z)(2), (14.50) 

dz 

2 a? d 22. 2 
p = Pim + P= Pim + Cl p° —m*)Pim = 0. (14.51) 
do do 


The equation for Pj is Bessel’s ODE, with solutions of relevance here J, (Jo). To satisfy 
the boundary condition at p = a we need to choose / = a; /a, where j can be any positive 
integer and a); is the jth zero of Jin. 

The equation for Z; has solutions e*'<; to satisfy the boundary condition at z = 0 we 
need to take the linear combination of these solutions that is equivalent to sinh/z. Combin- 
ing these observations, we see that possible solutions to the Laplace equation that satisfy 
all the boundary conditions other than that at z = h can be written 


Ynj = Cnj Jn (om) ) e'm? sinh (an) =). (14.52) 


Since Laplace’s equation is homogeneous, any linear combination of the Wnj with arbi- 
trary values of the c,,; will be a solution, and our remaining task is to find the linear 
combination of such solutions that satisfies the boundary condition at z = h. Therefore, 





V(—.9.0= YD. didn: (14.53) 


m=— j=1 


with the boundary condition at z = h expressed as 


[o@) 


lo) 
h 
> Yoemj Jn (am) ei”? sinh (em; *) =V(p,9,h). (14.54) 


m=—0o j=l 
Our solution is both a trigonometric series and a Bessel series, each with orthogonality 
properties that can be used to determine the coefficients. From Eq. (14.48) and the formula 
Qn 
i g ime git? — O78 a, (14.55) 
0 
we find 


> « h 2 ! 
Cmj =| a sinh Omj 7 Fina (mj) 


20 a 
feof V(0,0,h) Jn (mj) ei" odo, (14.56) 
0 0 





4 Note that here Z, is a function of z arising from the separation of variables; the notation is not intended to identify it as a 
Bessel function. 
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These are definite integrals, that is, numbers. Substituting back into Eq. (14.52), the series 
in Eq. (14.53) is specified and the potential V(¢, ¢, z) is determined. | 


Exercises 


14.2.1 Show that 


W=k i} Jy (kx) Jy (k’x)xdx = alk’ J, (ka) J) (k’a) — kJ) (ka) Jy (k’a)), 


d 
where J} (ka) = ke Tea lx—-a, and that 


2 


i 2 = a’ f 2 v 2 
[ten xdx = > {tvickay + (1 - a3) [Jy (ka)] \ v>-l. 
0 


These two integrals are usually called the first and second Lommel integrals. 


14.2.2 (a) If Bym is the mth zero of (d/dp)J,(Bymp/a), show that the Bessel functions are 
orthogonal over the interval [0, a] with an orthogonality integral 


a 
[6 (me ©) ty (Bon = )pdp= 0, m#én, v>-tl. 
0 
(b) Derive the corresponding normalization integral (m =n). 


a’ v2 2 
ANS. (b) (1- x) [Iv(Bum) I, v>—l. 


14.2.3. Verify that the orthogonality equation, Eq. (14.44), and the normalization equation, 
Eq. (14.46), hold for v > —1. 


Hint. Using power-series expansions, examine the behavior of Eq. (14.43) as p > 0. 


14.2.4 From Eq. (11.49), develop a proof that J,(z), v > —1 has no complex roots (with a 
nonzero imaginary part). 
Hint. (a) Use the series form of J\,(z) to exclude pure imaginary roots. 


(b) Assume a to be complex and take ay, to be a, 


14.2.5 (a) Inthe series expansion 


00 
p 
F(0) = J comIy (m=), O<p<a, v>-l, 


m=1 
with J, (@ym) = 0, show that the coefficients are given by 


af ro (m= “) p dp. 


Com = 
val Ca: 
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14.2.6 


14.2.7 


14.2.8 


14.2.9 


(b) Inthe series expansion 


oo 
p 
F(0)= > domIo (Bom =) O<p<a, v>-l, 


m=1 


with (d/dp) Jy (Bum e/a) |p=a= 0, show that the coefficients are given by 


2 
a (1 =v? Bon Lv (Bom)]? 





dim = / f(p) Jy (Bom “) p dp. 
0 


A right circular cylinder has an electrostatic potential of w(o, gy) on both ends. The 
potential on the curved cylindrical surface is zero. Find the potential at all interior 
points. 


Hint. Choose your coordinate system and adjust your z dependence to exploit the 
symmetry of your potential. 


A function f(x) is expressed as a Bessel series: 


fQx)= oan Jin (QmnX), 


n=1 


with a, the nth root of J. Prove the Parseval relation, 


1 
1 [o.@) 
[Ufc dx = 5 attdnes Gm? 
0 n=1 
Prove that 


Seem? = > iy 


Hint. Expand x” in a Bessel series and apply the Parseval relation. 


A right circular cylinder of length / and radius a has on its end caps a potential 


v (z=45) =100(1- 2), 


The potential on the curved surface (the side) is zero. Using the Bessel series from 
Exercise 14.2.6, calculate the electrostatic potential for o/a = 0.0(0.2)1.0 and z/l = 
0.0(0.1)0.5. Take a// = 0.5. 
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Hint. From Exercise 14.1.29 you have 


0n 


/ (1 7 2) Jo(y)ydy. 
On 
0 


Show that this equals 


0n 
1 
— | Jotvay. 
0n 


Numerical evaluation of this latter form rather than the former is both faster and more 
accurate. 


Note. For p/a = 0.0 and z// = 0.5 the convergence is slow, 20 terms giving only 98.4 
rather than 100. 


Check value. For p/a = 0.4 and z// = 0.3, 
w = 24.558. 


NEUMANN FUNCTIONS, BESSEL FUNCTIONS OF 
THE SECOND KIND 


From the theory of ODEs, it is known that Bessel’s equation has two independent solutions. 
Indeed, for nonintegral order v we have already found two solutions and labeled them 
J (x) and J_,(x) using the infinite series, Eq. (14.6). The trouble is that when v is integral, 
Eq. (14.5) holds and we have but one independent solution. A second solution may be 
developed by the methods of Section 7.6. This yields a perfectly good second solution of 
Bessel’s equation. However, that solution is not the standard form, which is called a Bessel 
function of the second kind or alternatively, a Neumann function. 


Definition and Series Form 


The standard definition of the Neumann functions is the following linear combination of 
Jy (x) and J_y(x): 


V(x) = cos vz J, (x) — Jv) 





: (14.57) 
sin vit 

For nonintegral v, Y,,(x) clearly satisfies Bessel’s equation, for it is a linear combination 
of known solutions, J, (x) and J_, (x). The behavior of Y,,(x) for small x (and nonintegral 
v) can be determined from the power-series expansion of J_,, Eq. (14.6); we may write, 
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calling upon Eq. (13.23), 


1 1 x\7Y 
= sin vat Fes (3) | 


_ T(v)Fd — v) 1 Gy 
~ 1 Fe 2 | 


=) (G) ote. (14.58) 


However, for integral v, Eq. (14.57) becomes indeterminate; in fact, Y,(x) for integral 
n is defined as 











¥,(x) = lim ¥, (x). (14.59) 


To determine that the limit represented by Eq. (14.59) exists and is not identically 
zero (so that Y,,(x) has a meaningful definition), we apply l’H6pital’s rule to Eq. (14.57), 
obtaining initially 





1 [dJy a 
n= = [F ey. l. (14.60) 





Inserting the expansions of J, and J_, from Eq. (14.6), the differentiations of (x /2)*°+” 
combine to yield (2/77) J, (x) In(x/2), while the derivatives of 1/T'(s +n-+ 1) yield terms 
containing w(s +n +1)/T(s +n+ 1), where wy is the digamma function (Section 13.2). 
The final result, whose verification is the topic of Exercise 14.3.8, is 





n—-1 


Yn(x) = = in(xyn (5) pS (n—k—D! Ci 








ki 2 

1 = -l ig 2k-+n 

e- a al vet D+vntk+)] (5) (14.61) 
k=0 ~ . 


An explicit form for w() for integer n is given in Eq. (13.40). 

Equation (14.61) shows that for n > 0, the most divergent term for small x is in agree- 
ment with the result for noninteger n given in Eq. (14.58). We also see that all solutions 
for integer n contain a logarithmic term with the regular function J, multiplying the log- 
arithm. In our earlier study of ODEs, we found that a second solution will usually have 
a contribution of this type when the indicial equation causes the exponents of the power- 
series expansion to be integers. We may also conclude from Eq. (14.61) that Y,, is linearly 
independent of J,,, confirming that we indeed have a second solution to Bessel’s ODE. 

It is of some interest to obtain the expansion of Yo(x) in a more explicit form. Returning 
to Eq. (14.61), we note that its first summation is vacant, and we have the relatively simple 
expansion 





OO 7 qk 
Yo(x) = = Jo(x)in(5) ier [ y+ml(5)- 





= swelere Gls E arm) 


where H; is the harmonic number a m7! and y is the Euler-Mascheroni constant. 
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FiGuRE 14.7 Neumann functions Yo(x), Y; (x), and Y2(x). 


The Neumann functions Y,,(x) are irregular at x = 0, but with increasing x become 
oscillatory, as may be seen from the graphs of Yo, Y;, and Y2 in Fig. 14.7. The definition 
of Eq. (14.57) was specifically chosen to cause the oscillatory behavior to be at the same 
scale as that of J, and displaced asymptotically in phase by 2/2, similarly to the relative 
behavior of the sine and cosine. However, unlike the sine and cosine, J, and Y,, only exhibit 
exact periodicity in the asymptotic limit. This point is covered in detail in Section 14.6. 
Figure 14.8 compares Jo(x) and Yo(x) over a large range of x. 


Integral Representations 


As with all the other Bessel functions, Y,(x) has integral representations. For Yo(x) we 
have 


lo) CO 
2 2 cos(xt) 
Yo(x) = = cos(x cosht)dt = x/ @-piPe dt, x>0. (14.63) 


0 1 





See Exercise 14.3.7, which shows that the above integral is a solution to Bessel’s ODE that 
is linearly independent of Jo(x). Specific identification as Yo is the topic of Exercise 14.4.8. 


Recurrence Relations 


Substituting Eq. (14.57) for Y,(x) (nonintegral v) into the recurrence relations for J, (x), 
Eqs. (14.7) and (14.8), we see immediately that Y, (x) satisfies these same recurrence rela- 
tions. This actually constitutes a proof that Y,, is a solution to the Bessel ODE. Note that 
the converse is not necessarily true. All solutions need not satisfy the same recurrence 
relations, as the relations depend on the scales assigned to the solutions of different v. 
An example of this sort of trouble appears in Section 14.5. 
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FiGURE 14.8 Oscillatory behavior of Jo(x) (solid line) and Yo(x) (dashed line) for 
1<x <30. 


Wronskian Formulas 


An ODE p(x)y” + q(x)y’ + r(x)y = 0 in self-adjoint form (so g = p’) was found in 
Exercise 7.6.1 to have the following Wronskian formula connecting its solutions u and v: 


A 
u(x)u' (x) — u'(x)v(x) = —~. (14.64) 
p(x) 
To bring Bessel’s equation to self-adjoint form, we need to write it as xy” + y’ + 
(x — v?/x)y = 0, thereby showing that for our present purposes p(x) = x, and we therefore 
have for each noninteger v 


eee 6 (14.65) 
yy ~ Tday = 


Since A, is a constant but can be expected to depend on v, it may be identified for each 


v at any convenient point, such as x = 0. From the power-series expansion, Eq. (14.6), 
we obtain the following limiting behaviors for small x: 


b> cam G) 4a G) 
> ———_ [= => —— | = 
Y TOLa) ar * OF ay \2 , 


(14.66) 
1 X\7Y —v x\-v-l1 
JL ——(-) , J’ ——— (-— : 
eC) aap 
Substitution into Eq. (14.65) yields 
j ; —2v 2 sin vr 
Ji (x) JL (x) — I(x) Jey) = (14.67) 





xrd+vPd—v) ax 
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using Eq. (13.23). Although Eq. (14.67) was obtained for x — 0, comparison with 
Eq. (14.65) shows that it must be true for all x, and that A, = —(2/z) sinvz. Note that A, 
vanishes for integral v, showing that the Wronskian of J, and J_, vanishes and that these 
Bessel functions are linearly dependent. 

Using our recurrence relations, we may readily develop a large number of alternate 
forms, among which are 


2 sin v7 








Jy J-p4i + Jy Jy-1 = ’ (14.68) 
TUX 
2 sin vit 
Jy J-p-1 + J-vJv41 = — > (14.69) 
UX 
a 
Lvoryi=—, (14.70) 
TUX 
2 
AWN 41 - Ay =-—. (14.71) 
UX 


Many more will be found in the Additional Readings. 

You will recall that in Chapter 7, Wronskians were of great value in two respects: (1) 
in establishing the linear independence or linear dependence of solutions of differential 
equations, and (2) in developing an integral form of a second solution. Here the specific 
forms of the Wronskians and Wronskian-derived combinations of Bessel functions are 
useful primarily in development of the general behavior of the various Bessel functions. 
Wronskians are also of great use in checking tables of Bessel functions. 


Uses of Neumann Functions 
The Neumann functions Y, (x) are of importance for a number of reasons: 


1. They are second, independent solutions of Bessel’s equation, thereby completing the 
general solution. 

2. They are needed for physical problems in which they are not excluded by a require- 
ment of regularity at x = 0. Specific examples include electromagnetic waves in coax- 
ial cables and quantum mechanical scattering theory. 

3. They lead directly to the two Hankel functions, whose definition and use, particularly 
in studies of wave propagation, are discussed in Section 14.4. 


We close with one example in which Neumann functions play a vital role. 


Example 14.3.1. = Coaxial Wave GUIDES 


We are interested in an electromagnetic wave confined between the concentric, conducting 
cylindrical surfaces p = a and p = b. The equations governing the wave propagation are 
the same as those discussed in Example 14.1.2, but the boundary conditions are now dif- 
ferent, and our interest is in solutions that are traveling waves (compare Exercise 14.1.26). 
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For wave propagation problems, it is convenient to write the solution in terms of com- 
plex exponentials, with the actual physical quantities involved ultimately identified as their 
real (or imaginary) parts. Thus, in place of Eq. (14.31) (the solution for standing waves ina 
cylindrical cavity), we now have for E, solutions in which the p dependence must involve 
both J, and Y,, (as the latter is not ruled out by a requirement for regularity at p = 0). 
Including the time dependence, we have for the TM (transverse magnetic) solutions the 
separated-variable forms 


E,= [ enn Jn (YmnP) + dmn¥m (Ymn p)| ee (14.72) 


with / now permitted to have any real value (there is no boundary condition on z). The 
index n identifies different possible values of yj,,. As in Eq. (14.30), the relation between 
Ymn, 1, and w is 





2 
Sg lif (14.73) 
C2 c~ Yinn id ‘ 
The most general TM traveling-wave solution will be an arbitrary linear combination 
of all functions of the form given by Eq. (14.72) with Yinn, Cmn, and dmn chosen so that 
Ez will vanish at p =a and p = b. A main difference between this problem and that of 
Example 14.1.2 is that the condition on E; is not given by the zeros of the Bessel functions 
Jm, but by zeros of linear combinations of J, and Y,,. Specifically, we require that 


€mn Jin (Ymn@) + dan Ym (Yinna) = 9, (14.74) 


Cmn Jin (Ymnb) + dinn Yin (Ymnb) = 0. (14.75) 


These transcendental equations may be solved, for each relevant m, to yield an infinite set 
of solutions (indexed by n) for yj», and the ratio dyn /Cmn. An example of this process is 
in Exercise 14.3.10. 

Returning now to the equation for w, we observe that the smallest value it can attain for 
the solution indexed by m and n is cy, Showing that TM waves can only propagate if the 
angular frequency w of the electromagnetic radiation is equal to or larger than this cutoff. 
In general, larger values of yn) correspond to higher degrees of transverse oscillation, and 
modes with greater transverse oscillation will therefore have higher cutoff frequencies. 

As for the circular wave guide (the subject of Exercise 14.1.26, there will also be TE 
modes of propagation, also with mode-dependent cutoffs. However, the coaxial guide can 
also support traveling waves in TEM (transverse electric and magnetic) modes. These 
modes, not possible for a circular waveguide, do not exhibit a cutoff, are the confined 
equivalent of plane waves, and correspond to the flow of current (in opposite directions) 
on the coaxial conductors. a 


Exercises 
14.3.1 Prove that the Neumann functions Y,, (with n an integer) satisfy the recurrence relations 


2n 
Yn—1(%) + Yngi(@®) = = mG), 





Yn—1(x) — Yn (a) = 2¥/ (x). 





14.3.2 


14.3.3 


14.3.4 


14.3.5 


14.3.6 


14.3.7 


14.3.8 


14.3.9 


14.3.10 
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Hint. These relations may be proved by differentiating the recurrence relations for J, 
or by using the limit form of Y,, but not dividing everything by zero. 


Show that for integer n 
Y_n(x) = (-1)"¥,, (x). 
Show that 
Yo(x) = —Y\(x). 


If X and Z are any two solutions of Bessel’s equation, show that 
/ / Ay 
Xy(2)Z,0) — X,@)Z.@) = —, 


in which A, may depend on v but is independent of x. This is a special case of Exer- 
cise 7.6.11. 
Verify the Wronskian formulas 

2 sin v7 


Jy (x) J_-v41 (x) + Jv (x) Iy-1 (4) = <n 


2 
WV) — Yr) = =. 
UX 
As an alternative to letting x approach zero in the evaluation of the Wronskian constant, 
we may invoke the uniqueness of power-series expansions. The coefficient of x~! in 
the series expansion of u,(x)v/,(x) — u',(x)v,(x) is then A,. Show by series expansion 
that the coefficients of x° and x! of Jy(x) J! (x) — J} (x) J_y(x) are each zero. 
(a) By differentiating and substituting into Bessel’s ODE for v = 0, show that 
CO : 7 
Jo cos(x cosh t)dt is a solution. 

Hint. Rearrange the final integral to i 4 [ x sin(x coshf) sinh t] dt. 


(b) Show that Yo(x) = —2 is cos(x coshtf)dt is linearly independent of Jo(x). 


Verify the expansion formula for Y,(x) given in Eq. (14.61). 


Hint. Start from Eq. (14.60) and perform the indicated differentiations on the power- 
series expansions of J, and J_,. The digamma functions y arise from the differen- 
tiation of the gamma function. You will need the identity (not derived in this book) 
Pe w(z)/T(Z) = (—1)"—'n!, where n is a positive integer. 


If Bessel’s ODE (with solution J,,) is differentiated with respect to v, one obtains 
d* (aJ d (ad, a, 
2 v v 2 Dp) v 
—v*)—=2vJ,. 
Ya (Fe) +e (FE) +6 ay _ 


Use the above equation to show that Y,,(x) is a solution to Bessel’s ODE. 


Hint. Equation (14.60) will be useful. 





For the case m=0, a=1, and b=2, the coaxial wave-guide TM boundary conditions 
become f (A) = 0, with 

Jo(2x) — Jot) 

Yo(2x) — Yo(x)” 








f@)= 
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Jo (2x) Jo (X) 
Yo (2x) Yo (x) 

















FiGURE 14.9 The function f(x) of Exercise 14.3.10. 


This function is plotted in Fig. 14.9. 


(a) Calculate f(x) for x = 0.0(0.1)10.0 and plot f(x) vs. x to find the approximate 
location of the roots. 


(b) Call a root-finding program to determine the first three roots to higher precision. 


ANS.  (b) 3.1230, 6.2734, 9.4182. 


Note. The higher roots can be expected to appear at intervals whose length approaches z. 
Why? AMS-S55 (see Additional Readings) gives an approximate formula for the roots. 
The function g(x) = Jo(x) Yo(2x) — Jo(2x) Yo(x) is much better behaved than the f (x) 
previously discussed. 


14.4 HANKEL FUNCTIONS 


Hankel functions are solutions of Bessel’s ODE with asymptotic properties that make 
them particularly useful in problems involving the propagation of spherical or cylindrical 
waves. Since the functions J, and Y,, form the complete solution of this ODE, the Hankel 
functions cannot be anything completely new; they must be linear combinations of the 
solutions we have already found. We introduce them here via straightforward algebraic 
definitions; later in this section we identify integral representations that some authors have 
used as a starting point. 
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Definitions 


Starting from the Bessel functions of the first and second kinds, namely J, (x) and Y,(x), 
we define the two Hankel functions HO (x) and Ho (x) (sometimes, but nowadays 
infrequently referred to as Bessel functions of the third kind) as follows: 


HY (x) = Jy(x) + iY), (14.76) 
H (x) = Jy(x) —i¥y(x). (14.77) 


This is exactly analogous to taking 





et? — cos ti sind. (14.78) 





For real arguments, Hi and H are complex conjugates. The extent of the analogy 
will be seen even better when their asymptotic forms are considered. Indeed, it is their 
asymptotic behavior that makes the Hankel functions useful. This behavior is discussed in 
Section 14.6, and in that section we provide an illustrative example in which the asymptotic 
properties play a key role. 

Series expansion of Hi (x) and He (x) may be obtained by combining Eqs. (14.6) and 
(14.62). Often only the first term is of interest; it is given by 


BD i cot no 
0 se ere oy ie eis ttt (14.79) 
r ay” 
Hay~ iH (2) date. AeSsi (14.80) 
Iv xX 
pine 4 2 2 
oy (x) © -i— Inx + 1-i=(y —In2)+ =, (14.81) 
a a 
r 2\" 
AP xi (2) +---, v>0. (14.82) 
XxX 


In these equations y is the Euler-Mascheroni constant, defined in Eq. (1.13). 
Since the Hankel functions are linear combinations (with constant coefficients) of J, 
and Y,,, they satisfy the same recurrence relations, Eqs. (14.7) and (14.8). For both Hi ) (x) 
(2) 
and Hy’ (x), 


Hy-1(%) + Hyyi(x) = * m0), (14.83) 





Ay—1(x) — Aysi(x) =2A/ (2). (14.84) 


A variety of Wronskian formulas can be developed, including: 


4 
(2) p7) Ip) _ 
HOH - HP a = — (14.85) 
2 
Jj-1H® — yo, =, (14.86) 
1X 
(2) 2) 2 
Jy1H® — JyH®, =-—. (14.87) 


LIX 
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Contour Integral Representation of 
the Hankel Functions 


The integral representation (Schlaefli integral) for J,(x) was introduced in Section 14.1, 
where we established that 
1 dt 
prea (x/2)(t—1/t) 
a) = 5 fe _ (14.88) 
Cc 


with C the contour shown in Fig. 14.5. Recall that when v is nonintegral, the integrand has 
a branch point at tf = 0 and the contour had to avoid a cut line that was drawn along the 
negative real axis. In developing the Schlaefli integral for general v, we began by showing 
that Bessel’s ODE was satisfied for any open contour for which an expression of the form 


e(*/2)(t-1/t) # 1 oe 
t ; 
t ue 2 o t ( ) 


vanished at both endpoints of the contour. 

We now make further use of those observations by noting that the expression in 
Eq. (14.89) not only vanishes at t = —oo on the real axis both below and above the cut, 
but that it also vanishes at t = 0 when that point is approached from positive t. 

We therefore consider the contour shown in Fig. 14.10, calling attention to the fact that 
the upper half of the contour (from t = 0+ to t = ove”"), labeled C1, meets the conditions 
necessary to yield a solution to Bessel’s ODE, and that the remaining (lower) half of the 
contour, labeled C2, also yields a solution. What remains to be determined is the identi- 
fication of these solutions: We will show that they are the Hankel functions. For x > 0, 
we assert that 








1 : dt 
(1) -_ (x/2)(t-1/t) 
Hy’ (x) = = [e per’ (14.90) 
Ci 
1 t 
(2) = (x/2)(t-1/t) 
Ay (x) = a a oa (14.91) 
C2 














FIGURE 14.10 Hankel function contours. 
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These expressions are particularly convenient because they may be handled by the method 
of steepest descents (Section 12.7). H (x) has a saddle point at t = +i, whereas Ho (x) 
has a saddle point at t = —i. 

There remains the problem of relating Eqs. (14.90) and (14.91) to our earlier definition 
of the Hankel functions, Eqs. (14.76) and (14.77). Since the contours of Eqs. (14.90) and 
(14.91) combine to produce a contour yielding J,, Eq. (14.88), we have, from the integral 
representations, 


1 
IQ) = 5 [Hi@) +H (|. (14.92) 
If we can show (also from the integral representations) that 


¥,(x) = iu [HP a - HP], (14.93) 


we will be able to recover the original definitions of the HY. 

We therefore rewrite Eq. (14.90) by replacing the integration variable t by e'”/s, so 
the integrand of that equation becomes —e“/?)S—!/5)e—i¥ 5”! A fter the substitution the 
contour (in s) is found to be the same as C1, but traversed in the opposite direction (thereby 
compensating the initial minus sign in the transformed integrand). The result, with details 
left as Exercise 14.4.3, is that the contour integral representation of H“ is consistent with 
the identification 


HO (x) =e AO (x). (14.94) 
Similar processing of Eq. (14.91), with t = e~'” /s, leads to 
HO (x) =e? HO (x). (14.95) 


We now combine Eqs. (14.94) and (14.95) to reach 
J-y(x) = ; [een HP a) teh HM], (14.96) 
where again the H® refer to the contour integral representations. Substituting Eqs. (14.92) 
and (14.96) into the defining equation for Y,,, Eq. (14.57), we confirm that Y,, is described 
properly when the HY? stand for their contour integral representations. This completes 
the proof that Eqs. (14.90) and (14.91) are consistent with the original definitions of the 
Hankel functions. 

The reader may wonder why so much stress is placed on the development of integral 
representations. There are several reasons. The first is simply aesthetic appeal. Second, 
the integral representations facilitate manipulations, analysis, and the development of rela- 
tions among the various special functions. We have already seen an example of this in the 
development of Eqs. (14.94) to (14.96). And, probably most important of all, integral rep- 
resentations are extremely useful in developing asymptotic expansions. Such expansions 
can often be obtained using the method of steepest descents (Section 12.7), or by methods 
involving expansion in negative powers of the expansion variable, as in Section 12.6. 
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Exercises 


14.4.1 


14.4.2 


14.4.3 


14.4.4 


In conclusion, the Hankel functions are introduced here for the following reasons: 





As analogs of e*'* they are useful for describing traveling waves. These applications 
are best studied when the asymptotic properties of the functions are in hand, and there- 
fore are postponed to Section 14.6. 


They offer an alternate (contour integral) and rather elegant definition of Bessel 
functions. 


We will see in Section 14.5 that they offer a route to the definition of the quantities 
known as modified Bessel functions, and that in Section 14.6 they are useful for the 
development of the asymptotic properties of Bessel functions. 


Verify the Wronskian formulas 


y Soe) HO 0) = Hcy HP (x) = 26 


Mx? 


( 
(6) Ay) HO” (x) — Fe) HOH) = — 24 


Ux? 


() YO) HS? (x) — Vi) A x) = -2, 


feb} 


) ¥, (x) HO (x) — Ye) HP (x) = —2 


mx? 


(ce) HS?) HY (x) — HS? CHM x) = - 


x? 


() AP C®AV @)- Hh? OAD w= 4 


(g) Jv) AS? (x) — WOH (x) = 


Ix * 


Show that the integral forms 


as d 
t 
(x/2)(t—1/t) — WY 
(a) in ; e peti = Hy; (x), 
OC; 
1 ; d 
t 
(x/2)(t-1/t) — F® 
) = i é sat = HP @) 
coe '™ Cy 


satisfy Bessel’s ODE. The contours C; and C2 are shown in Fig. 14.10. 


Show that the substitution t = e'7/s into Eq. (14.90) for HS Dex) not only produces the 
integrand for the similar integral representation of HH? (x) but that the contour in s is 
identical to the original contour in f. 


Using the integrals and contours given in Exercise 14.4.2, show that 


HM) — H® (x)] = Y,(x). 
Ll 





14.4.5 


14.4.6 


14.4.7 
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FIGURE 14.11 Hankel function contours for Exercise 14.4.5. 


Show that the integrals in Exercise 14.4.2 may be transformed to yield 


(a) H (x) = - Tes ersinby—vY gy, 


(b) Hx) = J i ex sinh y—vY gy, 


I 
where C3 and Cy are the contours in Fig. 14.11. 


(a) Transform HOY), Eq. (14.90), into 


HG (x) = = i; el coshs ay 
in 
Cc 
where the contour C runs from —oo — iz /2 through the origin of the s-plane to 
oo +in/2. 
(b) Justify rewriting HY (x) as 
oot+in/2 
2 ; 
HG? x) = / ix coshs ye 
im 
0 
(c) Verify that this integral representation actually satisfies Bessel’s differential equa- 


tion. (The iz/2 in the upper limit is not essential. It serves as a convergence factor. 
We can replace it by iam /2 and take the limit a > 0.) 


From 
Co 
(1) 2 ixcoshs 
Ay (x)= | e ds 
im 

show that 
a) Jo(x) = 2 f° sin(xcoshs)ds, (b) Jo(x) = 2 [°° 2&2 ar. 
(a) =f 2 


Jf t2-1 


This last result is a Fourier sine transform. 
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CO 


2 ee 
14.4.8 From HEY (x) ae ; e'* Shs 7s (see Exercises 14.4.5 and 14.4.6), show that 
in 


14.5 


0 


CO 


(a) Yo(x) = —= f costx coshsyds, 
0 


(b) Yoxy=—= f S280 


mJ Jf? —1) 


These are the integral representations in Eq. (14.63). This last result is a Fourier cosine 
transform. 


dt. 


MODIFIED BESSEL FUNCTIONS, /,(x) AND K,(x) 


The Laplace and Helmholtz equations, when separated in circular cylindrical coordinates, 
may lead to Bessel’s ODE in the coordinate o that describes distance from the cylindrical 
axis. When that is the case, the behavior of the solutions as a function of p is inherently 
oscillatory; as we have already seen, the Bessel functions J, (kp), and also Y,,(ko), have 
for any value of v an infinite number of zeros, and this property may be useful in causing 
satisfaction of boundary conditions. However, as already shown in Section 9.4, the con- 
nection constants arising when the variables are separated may have a sign opposite to that 
required to yield Bessel’s ODE, and the equation in the p coordinate then assumes the 
form 


2 a? d 2.9) 2 
p ae eg ee py + v°) Py(kp) = 0. (14.97) 


Equation (14.97), known as the modified Bessel equation, differs from the Bessel ODE 
only in the sign of the quantity k*7, but this small change is sufficient to alter the nature 
of the solutions. As we shall shortly discuss in more detail, the solutions to Eq. (14.97), 
called modified Bessel functions, are not oscillatory and have behavior that is exponential 
(rather than trigonometric) in character. 

Fortunately, the knowledge we have developed regarding the Bessel ODE can be put 
to good use for the modified Bessel equation, since the substitution k — ik converts the 
conventional Bessel ODE to its modified form, and shows that if P, (ko) is a solution to 
the Bessel ODE, then P,(ikp) must be a solution to the modified Bessel equation. One 
way of stating this fact is to note that the solutions of Eq. (14.97) are Bessel functions of 
imaginary argument. 
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Series Solution 


Since any solution of Bessel’s ODE can be converted into a solution of the modified ODE 
by insertion of i into its argument, let’s start by looking at the series expansion 








fae oe ~ (—1)* ix ae ry oe 1 x\v+2s 
Wi=DorgeyeD (3) — Darren (2) = ee) 


Since all the terms of the summation have the same sign, it is evident that J, (ix) cannot 
exhibit oscillatory behavior. It is convenient to choose the solutions of the modified Bessel 
equation in a way that causes them to be real, and we accordingly defined the modified 
Bessel functions of the first kind, denoted /,,(x), as 





CO 
; ; 1 X\ v+2s 
ee oe . — ,-iva/2 iaf2s __ 
I(x) =i "J, (ix) =e Jy (xe = LL ar@tveD (5) ‘ (14.99) 
s= 


Like J, for v > 0, J, is finite at the origin, with a power-series expansion that is convergent 
for all x. At small x, its limiting behavior will be of the form 


v 
PT@+ht 


From the relation between J, and J_,, we may also conclude that J, and /_, are linearly 
independent unless v is an integer n; taking cognizance of the factor i~” in the definition 
of J, the linear dependence takes the form 


In(x) = Ln (x). (14.101) 


iL@s (14.100) 


Graphs of Jp and J; are shown in Fig. 14.12. 


Recurrence Relations for /, 


The recurrence relations satisfied by /,(x) may be developed from the series expansions, 
but it is perhaps easier to work from the existing recurrence relations for J, (x). Our starting 
point is Eq. (14.7), written for ix: 


Jy-1 x) + Jy 41 ix) = a Ini). (14.102) 
We change J to J, related according to Eq. (14.99) by 
Jy (ix) =i" L(x), (14.103) 
thereby obtaining 
Pha G4 ha GS i hte), 
which simplifies to 


Ly-1(x) — Kyi) = * ni. (14.104) 





682 Chapter 14 Bessel Functions 











FIGURE 14.12 Modified Bessel functions. 


In a similar fashion, Eq. (14.8) transforms into 
Ty1) + L4i@%) = 205). (14.105) 


The above analysis is also the topic of Exercise 14.1.14. 


Second Solution K,, 


As already pointed out we have but one independent solution when v is an integer, exactly 
as for the Bessel functions J,,. The choice of a second, independent solution of Eq. (14.97) 
is essentially a matter of convenience. The second solution given here is selected on the 
basis of its asymptotic behavior, which we examine in the next section. The confusion of 
choice and notation for this solution is perhaps greater than anywhere else in this field.° 
There is also no universal nomenclature; the K, are sometimes referred to as Whittaker 
functions. Following AMS-55 (see Additional Readings for reference), we here define a 
second solution in terms of the Hankel function HY (x) as 


Ky) = SP HD Gx) = Fi Gx) +i, G0]. (14.106) 





5Discussion and comparison of notations will be found in Math. Tables Aids Comput. 1: 207-308 (1944) and in AMS-55 
(see Additional Readings). 
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The factor i?+! makes K,, (x) real when x is real. Using Eqs. (14.57) and (14.99), we may 
transform Eq. (14.106) to’ 


mw I_y(x) — (x) 


Kw=5 (14.107) 


sin vat 


somewhat analogous to Eq. (14.57) for Y,(x). The choice of Eq. (14.106) as a definition 
is somewhat unfortunate in that the function K,(x) does not satisfy the same recurrence 
relations as J\,(x). The recurrence formulas for the K,, are 


Ky) — Kyi) = — Kya, (14.108) 





Ky-1(x) + Ky41(x) = —2K7,(x). (14.109) 


To avoid this discrepancy in the recurrence relations, some authors® have included an 
additional factor of cos vz in the definition of K,. This would permit K, to satisfy the 
same recurrence relations as /,, (see Exercise 14.5.8), but it has the disadvantage of making 
= _135 

K, =0 forv= JF Doerr 

The series expansion of K,,(x) follows directly from the series form of HY (ix), pro- 
viding that we choose the branch of Inix appropriately (see Exercise 14.5.9). Using 
Eqs. (14.79) and (14.80), the lowest-order terms are then found to be 


Ko(x) =—Inx —y +1n24+---, (14.110) 
Rye? Trike +. (14.111) 


Because the modified Bessel function J), is related to the Bessel function J,,, much as sinh 
is related to sine, the modified Bessel functions J, and K, are sometimes referred to as 
hyperbolic Bessel functions. Ko and K are shown in Fig. 14.12. 


Integral Representations 


Io(x) and Ko(x) have the integral representations 


4 


1 
Ip(x) = — / cosh(x cos 0)dé, (14.112) 
XT 
0 
CO CO 
: cos(xt)dt 
Ko(x) = cos(x sinh t)dt = @+D2’ x>0. (14.113) 
0 0 


Equation (14.112) may be derived from Eq. (14.20) for Jo(x) or may be taken as a special 
case of Exercise 14.5.14. The integral representation of Ko, Eq. (14.113), is derived in 
Section 14.6. A variety of other forms of integral representations (including v 4 0) appear 


6If v is not an integer, K,(z) has a branch point at z = 0 due to the presence of a fractional power; if v =n, an integer, Ky (z) 
has a branch point at z = 0 due to the term Inz. We normally identify K,,(z) as the branch that is real for real z. 

For integral index n we take the limit as v > n. 

8For example, Whittaker and Watson (see Additional Readings). 
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in the exercises. These integral representations are useful in developing asymptotic forms 
(Section 14.6) and in connection with Fourier transforms (Chapter 19). 

Example 14.5.1 A Green’s FUNCTION 


We wish to develop an expansion for the fundamental Green’s function for the Laplace 
equation in cylindrical coordinates (p, y, z). The defining equation is 


a. to, Deo, a 1 
+ G(r, 42) = 5(01 — p2) 5S (G1 — $2)8(Z1 — 22). 





dpp PL APL pray? = Ay p} 
(14.114) 
We now write the Dirac delta function for the g coordinate in the form corresponding to 
Eq. (5.27): 
1 CO 
= mace im(g1—92) 
s@i-—g)=s— Dimer”, 
m=—CO 


For the z coordinate, we use the continuum limit of the above formula, or, equivalently, 
the large-n limit of Eq. (1.155), 


le if 
6(z1 -z2) =— / elk 1-22) gk = = | coskta — z9)dk. 
20 ua 


—0o 0 


We use the last form of the above equation so that k will never be negative. 
We now expand G(rj, r2) as 


Co 
1 . 
Gi.) = 55 a ‘| dkgm(k, pi, pre!” cosk(z1 — Z2). (14.115) 
m 
0 


For g; and @2, this is simply an expansion in orthogonal functions; the dependence on 
Z1, 22, and k is actually an integral transform that will be more completely justified in 
Chapter 20. For our present purposes, what is significant is that we can apply the orthog- 
onality properties of the expansion to find that Eq. (14.114) will be satisfied if (for all 
relevant values of k and m) 


ev 10 wm , 
z+ x —K | gmk, pi, 02) = 8(p1 — £2). (14.116) 
dp, PL OPL py 





We now have a one-dimensional (1-D) Green’s function problem for which the homoge- 
neous equation can be identified as the modified Bessel equation, with solutions [,, (kp) 
and K,, (ko). Keeping in mind that J, is regular at the origin, that K,, is regular at infinity, 
and that the Green’s function we seek must be regular at both these limits, we write our 
1-D axial Green’s function in the more explicit form 


&m(kp1, ko2) = —Im (kp<) Km (kps), (14.117) 
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where op. and py are, respectively, the smaller and larger of o; and 2. The coefficient in 
the above equation, —1, is evaluated according to Eq. (10.19), from 


-1 
( pKp)[ Kj, (ko) Im (kp) ~ 1),(ko)Km(kp)]) 


The coefficient p is from the differential equation, and has here the value kp; the form 
involving modified Bessel functions is their Wronskian, and has the value —1/kp; that is 
the topic of Exercise 14.5.11. 

Given our explicit formula for gj, Eq. (14.115) assumes the final form 


CO 
1 
G(r},¥2) = at > | Aken (kp1, kpz)e'™!-& cosk(z1 — 22). (14.118) 
IT ih : 
This is the form quoted in Section 10.2. a 
Summary 


To put the modified Bessel functions /,(x) and K,(x) in proper perspective, note that we 
have introduced them here because: 


e These functions are solutions of the frequently encountered modified Bessel equation, 
which arises in a variety of physically important problems, 


e K,(x) will be found useful in determining the asymptotic behavior of all the Bessel 
and modified Bessel functions (Section 14.6), and 


e J,(x) and K,(x) arise in our discussion of Green’s functions (Example 14.5.1). 


Exercises 
Co 
14.5.1. Show that e@/241/9~ > I,(x)t", thus generating modified Bessel functions, I, (x). 
n=—0o 
14.5.2 Verify the following identities 
(a) 1= Io(x) +2507 1 (-1)" Ion), 
(b) eX =[p(x) + 20°, In(x), 
(c) e* =Ip(x) +20, (-D)" In), 
(d) coshx = Ip(x) +20 Inn (x), 
(ey) sinha = 2) ni): 
14.5.3 (a) From the generating function of Exercise 14.5.1 show that 


1 dt 
= (x/2)(t+1/t) 
aia 2ni fe perl” 
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(b) Forn= v, not an integer, show that the preceding integral representation may be 
generalized to 


dt 
(x/2)(t4+1/t)_“* 
BOS 2Qni aa fe ptt 
Cc 


The contour C is the same as that for J, (x) (Fig. 14.5). 
14.5.4 Forv> —5 show that J,,(z) may be represented by 


uw 








Iy(z (5) fe £20086 sin?” @ dQ 
y(z) = SEES 5 ( 

0 

1 1 
v 
=, ; (=) fera-ry Pa 
m/2T(v + 5) \2 
—1 


m/2 
2 v 
= a (5) i cosh(zcos@) sin?” 6 dé. 
m/2T(v+ 5) \2 
0 


14.5.5 The cylindrical cavity depicted in Fig. 14.4 has radius a and height h. For this exercise, 
the end caps z = 0 and / are at zero potential, while the cylindrical wall » =a has a 
potential of functional form V = V(Qg, z). 


(a) Show that the electrostatic potential ®(p, gy, z) has the functional form 


Co CO 


(0,9,2) = ¥> Y > Im(knp) (nn Sinme + Brn cos mg) sin knz, 
m=0n=1 


where ky =nz/h. 


(b) Show that the coefficients a, and bmn are given by 


Amn sinme : 
V kazdzde. 
jon} = 11 (Kn sate ff (Y 2 ene, sin es 


Hint. Expand V (g, z) as a double series and use the orthogonality of the trigonometric 
functions. 


14.5.6 Verify that K, (x) as defined in Eq. (14.106) is equivalent to 


ue T_y(x) — I(x) 


K = 
v@) 2 sin vit 


and from this show that 


Ky (x) = K_y(). 
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14.5.10 


14.5.11 


14,.5.12 
14.5.13 


14.5.14 


14.5.15 


14.5 Modified Bessel Functions, [,(x) and K(x) 687 
Show that K,, (x) satisfies the following recurrence relations: 


2v 
Ky-1(x) — Ky 41) = —7 kv G), 





Ky-1(«) + Kyqil(x) = —2K),(x). 
Note. These differ from the recurrence relations for /,. 
If K, =e"! K,, show that K, satisfies the same recurrence relations as J). 


Show that when Ko is evaluated from its series expansion about x = 0, the formula 
given as Eq. (14.110) only follows if a specific branch of its logarithmic term is chosen. 


For v > —5 show that K,(z) may be represented by 


[o,@) 
1/2 
Ky (z) = ———_ (5) f ers sink? a, Bae argz < a 
Tt) \2 2 2 
[o.@) 


xe z\? 
eee (ea e ?P 2 = 1 v-1/2q : 
a) | (p ) P 


Show that /,,(x) and K,,(x) satisfy the Wronskian relation 
1 
Ty (x) Ki (x) — Ij(«) Ky (x) = a, 


Verify that the coefficient in the axial Green’s function of Eq. (14.117) is -1. 
Ifr = (x? + y?)!/”, prove that 

CO 
1 2 
-=— / cos(xt)Ko(yt)dt. 
ro 

0 

This is a Fourier cosine transform of Ko. 


Derive the integral representation 
1 
In(x) = Z / e* 4 cos(nd) dd. 
" 0 
Hint. Start with the corresponding integral representation of J, (x). Equation (14.112) 
is a special case of this representation. 


Show that 


oo 
Ko(z) laa 
0 


satisfies the modified Bessel equation. How can you establish that this form is linearly 
independent of Jg(z)? 
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14.5.16 Thecylindrical cavity of Exercise 14.5.5 has along the cylinder walls the potential walls: 


14.6 


& 


ios, O22 21/2, 
7 h h 
V@)= Zz Zz 
100(1- =), ie = 24. 
h h 


With the radius-height ratio a/h = 0.5, calculate the potential for z/h = 0.1(0.1)0.5 
and p/a = 0.0(0.2)1.0. 


Check value. For z/h = 0.3 and p/a = 0.8, V = 26.396. 


ASYMPTOTIC EXPANSIONS 


Frequently in physical problems there is a need to know how a given Bessel or modified 
Bessel function behaves for large values of the argument, that is, its asymptotic behavior. 
This is one occasion when computers are not very helpful. One possible approach is to 
develop a power-series solution of the differential equation, but now using negative pow- 
ers. This is Stokes’ method, illustrated in Exercise 14.6.10. The limitation is that starting 
from some positive value of the argument (for convergence of the series), we do not know 
what mixture of solutions or multiple of a given solution we have. The problem is to relate 
the asymptotic series (useful for large values of the variable) to the power-series or related 
definition (useful for small values of the variable). This relationship can be established 
is various ways, one of which is to introduce a suitable integral representation whose 
asymptotic behavior can be studied by application of the method of steepest descents, 
Section 12.7. 

We start this process with a study of the Hankel functions, for which a contour integral 
representation was introduced in Section 14.4. 


Asymptotic Forms of Hankel Functions 


In Section 14.4 it was shown that the Hankel functions, which satisfy Bessel’s equation, 
may be defined by the contour integrals 





] d 
HOG) = =; f omer S, (14.119) 
Tl ra 
Ci 
1 
HO) = =p | Se (14.120) 
Tl ra 
C2 


where C, and C2 are the contours shown in Fig. 14.10. We desire formulas based on these 
representations for the asymptotic behavior of the Hankel functions at large positive f. 
The direct and exact evaluation of these integrals appears to be nearly impossible, but 
the situation does have features permitting us to use the method of steepest descents to 
make an asymptotic evaluation. Referring to the exposition of that method in Section 12.7, 
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we have the approximate evaluation 


: . 20 
J stesnen Pde glen, nemo ery |w’’ (zo, t)|’ ene 


Cc 


where the contour C passes through a saddle point at z = zo and 


a ee + (5 ‘= 





14.122 
2 ee panes) 
is a phase arising from the direction of passage through the saddle point. 

We regard the common integrand of Eqs. (14.119) and (14.120) as possessing a slowly 
varying factor g(z) = z~"~! and an exponential e” with w = (t/2)(z — z~!), and seek 
saddle points by finding the zeros of 


ae 1+ : (14.123) 
DO — ioe =a pe ‘ 
2 z 
Solving the above equation, we identify the two saddle points zo = +i and zo = —i. 


Limiting attention to Hi ) (t), we see that we can deform the contour C| so that it passes 
through the saddle point at zo = 7; there is neither the need nor the possibility to deform 
this contour to pass through zo = —i. Thus, at the saddle point, we have 


t 
w(ti)=it, w"(+i)=-Z 
<0 


=-—it. (14.124) 


zo=i 





The argument of w’(zo) is —7/2, so the possible values of the phase @ (the direction of 
descent from the saddle point) are 37/4 and 77/4. We must choose 6 = 3/4 since we 
cannot get into position to cross the saddle point in the direction 6 = 71/4 = —1/4 without 
first crossing a region where the integrand is larger in absolute value than its value at the 
saddle point. 

We now have all the information needed to use Eq. (14.121) to estimate the integral. 


The result is 
HO) ss 1 pax-v-1) Bin /4git [27 
ai t 


>. 
. ie pilt-vm/2—m/4) (14.125) 
mt 


This is the leading term of the asymptotic expansion of the Hankel function HY? (t) for 
large t. The other Hankel function can be treated similarly, but using the saddle point at 
z= —i, with result 


is. 
H(t) & ne (14.126) 


Equations (14.125) and (14.126) permit us to obtain the leading terms in the asymp- 
totic behavior of all the Bessel and modified Bessel functions. In particular, inserting the 
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asymptotic form for H (ix) into Eq. (14.106), which defines K,,(x), we find 


XT 2 - 
K,(x) 5 = ee Nee 


by ge (14.127) 
2x 


Another solution to the modified Bessel equation can be obtained from H® (ix); its 
asymptotic behavior will be proportional to e+*. Combining the present observations with 
Egs. (14.100), (14.110), and (14.111), we can conclude that: 


1. The modified Bessel function K, (x) will be irregular at x = 0 as given by Eqs. (14.110) 
or (14.111), and will decay exponentially at large x; 

2. The modified Bessel function J, (x) will (for v > 0) be finite at the origin, as given by 
Eq. (14.100), and will increase exponentially at large x. 


Rather than developing additional asymptotic forms from Eq. (14.127), we find it more 
interesting to obtain more complete asymptotic expansions by use of a particular integral 
representation of Ky. 


Expansion of an Integral Representation for K,, 


Here we start from the integral representation 


[ee 


1 
3)’ [ a?- Dax, v>—3. (14.128) 
1 


lf 


Roa = 
ares 


2 


For the present let us take z to be real, although Eq. (14.128) may be established for 
—m/2<argz <7 /2 (ie., for Re(z) > 0). 

Before using Eq. (14.128) we need to verify that (1) the form claimed to be K,(z) 
satisfies the modified Bessel equation, (2) that it has the small-z behavior required for 
Ky, and (3) that it has the required exponentially decaying asymptotic value. These three 
features suffice to establish the validity of Eq. (14.128). 

The fact that Eq. (14.128) is a solution of the modified Bessel equation may be verified 
by direct substitution into Eq. (14.97). After some manipulation, we obtain 


zt a 


le Me? — DY?) dx =0, 


which transforms the combined integrand into the derivative of a function that vanishes at 
both endpoints. 





14.6 Asymptotic Expansions 691 


We next consider how Eq. (14.128) behaves for small z. We proceed by substituting 
x=1+1t/z: 


Cc 
el 


Z\Y fo zx 72 v-1/2 
(EY) fp ex? — 12 dx 
ror (a) d 
v-1/2 


CO 
wi/2 | (5+ +7) dt 
e. — 
= eens, 5 (5 ‘4 Zz 
[oe] 


1/2 97\ 1/2 
ee few =t g20— (14 =) dt. (14.129) 
P+ 5) 2?2” 





0 


This substitution has changed the limits of integration to a more convenient range and has 
isolated the negative exponential dependence e~*. The integral in Eq. (14.129) may now 
(for v > 0) be evaluated for z = 0 to yield [(2v). Then, using the duplication formula, 
Eq. (13.27), we have 
T(v)2’-! 

lim K,(z) = ————.,,_ v>0. (14.130) 

z>0 Ze 
Equation (14.130) agrees with Eq. (14.111), showing that Eq. (14.128) has the proper 
small-z behavior to represent K,. Note that for v = 0, Eq. (14.128) diverges logarithmi- 
cally at z = 0 and the verification of its scale requires a different approach, which is the 
topic of Exercise 14.6.4. 

Finally, to complete the identification of Eq. (14.128) with K,,, we need to verify that it 

decays exponentially at large z. That feature will be a by-product of our main interest here, 
which is to develop an asymptotic series for K,(z). We do so by rewriting Eq. (14.129) as 


= v—-1/2 
K,(2) = ett (143 =) dt. 14.131 
(2) lina - (14.131) 


We next expand (1 + t/2z)”~!/? by the binomial theorem and interchange the summation 
and integration (valid for the asymptotic series we plan to obtain), reaching 


cia e* cay Syne ? 
K@=J/-— =) ("2 )\e 5 al 
— 22Twv+4) ("; Jeo . 
0 


r=0 
a oo Pw+rt 4) _ 
ue 14.132 
o 2 ous5 ye ( 22) 


Equation (14.132) can now be rearranged to 


T _, (4v2- 17)  (4v? — 17) (4v? — 3?) 
ee Vi [i+ ez 21(8z)? pe oe 
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Equation (14.133) yields the anticipated exponential dependence, confirming that 
Eq. (14.128) actually represents Ky. 

Although the integral of Eq. (14.128), integrating along the real axis, was convergent only 
for —2/2 < argz < 7/2, Eq. (14.133) may be extended to —37/2 < argz < 3/2. Consid- 
ered as an infinite series, Eq. (14.133) is actually divergent. However, this series is asymp- 
totic, in the sense that for large enough z, K,,(z) may be approximated to any fixed degree of 
accuracy with a small number of terms. Compare Section 12.6 for a definition and discus- 
sion of asymptotic series. The asymptotic character arises because our binomial expansion 
was valid only for t < 2z but we integrated rf out to infinity. The exponential decrease of the 
integrand has prevented a disaster, but the series is only asymptotic and not convergent. By 
Table 7.1, z = 00 is an essential singularity of the Bessel (and modified Bessel) equations. 
Fuchs’ theorem does not guarantee a convergent series and we did not get one. 

It is convenient to rewrite Eq. (14.133) as 











KO = [Ze Ai +1000), (14.134) 
where 
(u—1)(e—-9) | (e— Dm — 9)(u — 25)(u — 49) 
Pylgyeed 7822 + CS, (14.135) 
w-1 (u—I(G—9)(u— 25) 
Q)(z) TS 31D} ++, (14.136) 


and uw = 4v?. It should be noted that although P,(z) of Eq. (14.135) and Q,(z) of 
Eq. (14.136) have alternating signs, the series for P, (iz) and Q,(iz) in Eq. (14.134) have 
all positive signs. Finally, note that for z large, P,, dominates. 


Additional Asymptotic Forms 


We started our detailed study of asymptotic behavior with K,, because, with its properties 
in hand, we can deduce the asymptotic expansions of the other members of the family of 
Bessel-related functions. 


1. Rearranging the definition of K,, to 
2 ; 
HO (x) = Se“ G/T K, (ix), (14.137) 
1 


we have 


(1) 2 : 1\ xz ; 
HY @ =f —exjilz—(v+5)5 [ Pu(z) +iO,(2)], (14.138) 


which although originally derived for real values of —ix, can be analytically continued 
into the larger range —z < argz < 27. 

2. The second Hankel function is just (for real arguments) the complex conjugate of the 
first, and therefore 


(2) 2 : 1\ 2 . 
Ay (Zz) = az ey Fle - (ta) 5 | Py(z) —iQ,(z)], (14.139) 


valid for —27 <argz <7. 
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3. Since J, (z) is the real part of H(z) for real z, 


2 1\ 7 : 1\ 2 
i= ys, {Ps(2).os E = (v+ 5) | — Q,(z) sin |:- (v+ ;) |}. 


(14.140) 


valid for —m < argz <7. 
4. The Neumann function is the imaginary part of HO (z) for real z, or 


; 2 . 1\z 1\ az 
Yy(z)= = | Poersin] 2 (+5) $] + Quteroos[z-(v+5) Ff. 


(14.141) 
also valid for —z < argz <7. 
5. Finally, the modified Bessel function /,,(z) is given by 
1,(z) =i-” Jy (iz), (14.142) 
so 
L@= sal Pate —iQy(iz)], (14.143) 


valid for —m/2 <argz< 7/2. 


Properties of the Asymptotic Forms 


Having derived the asymptotic forms of the various Bessel functions, it is opportune to 
note their essential characteristics. Remembering that in the limit of large z, P,, approaches 
unity while Q, ~ 1/z, we see that at large z, all the Bessel functions have leading terms 
with a 1/z!/? dependence, multiplied by either a real or complex exponential. The modified 
functions K,, and J, respectively, contain decreasing and increasing exponentials, while 
the ordinary Bessel functions J, and Y, have leading terms with sinusoidal oscillation 
(damped by the z~!/? factor). When multiplied by a time factor e+’, the Hankel functions 
can describe incoming and outgoing traveling waves. 

Looking at the oscillatory functions J,, Y,, H® in more detail, we see that exact sinu- 
soidal behavior is only reached in the limit of large z, as for finite z the terms involving Q, 
will to some extent alter the periodicity. The reader may wish to compare the positions of 
the zeros of J, in Table 14.1 with those predicted by its leading term, namely the zeros of 


wfe-(o42)3] 


We see that J, behaves asymptotically like a phase-shifted cosine function, with the phase 
shift a function of n. The asymptotic form of Y,, will be that of a sine function, with (for 
the same n) the same phase shift. This causes the zeros of J, and Y,, for large z to alternate, 
as we saw for Jo and Yo in Fig. 14.8. 

The asymptotic behavior of the two solutions to a problem described by ordinary or 
modified Bessel functions may be sufficient to eliminate immediately one of these func- 
tions as a solution for a physical problem. This observation may enable us to use the behav- 
ior at z = oo as well as that at z = 0 to restrict the functional forms we need to consider. 
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FIGURE 14.13 Asymptotic approximation of Jo(x). 


Finally, we note that the asymptotic series P,(z) and Q,(z), Eqs. (14.135) and (14.136), 
terminate for v = +1/2, +3/2,... and become polynomials (in negative powers of z). For 
these special values of v the asymptotic approximations become exact solutions. 

It is of some interest to consider the accuracy of the asymptotic forms, taking for exam- 


ple just the first term 
n(x) &4/ cos | x n+ : (14.144) 


Clearly, the condition for Eq. (14.144) to be accurate is that the sine term of Eq. (14.140) 
be negligible; that is, 

















8x >> 4n? — 1. (14.145) 


In Fig. 14.13 we plot Jo(x) and the leading term of its asymptotic approximation. The 
agreement is nearly quantitative for x > 5. However, for n or v > | the asymptotic region 
may be far out. 

Another use of the asymptotic formulas is to establish the constants in Wronskian for- 
mulas, where we know the Wronskian of any two Bessel functions of argument x has a 
1/x functional dependence but with a premultiplying constant that depends on the Bessel 
functions involved. 


Example 14.6.1 ~~ CYLINDRICAL TRAVELING WAVES 


As an illustration of a problem in which we have chosen a specific Bessel function because 
of its asymptotic properties, consider a two-dimensional (2-D) wave problem similar to the 
vibrating circular membrane of Exercise 14.1.24. Now imagine that the waves are gener- 
ated at r = 0 and move outward to infinity. We replace our standing waves by traveling 
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ones. The differential equation remains the same, but the boundary conditions change. We 
now demand that for large r the wave behave like 


eo), (14.146) 


to describe an outgoing wave with wavelength 27/k. We assume, for simplicity, that there 
is no azimuthal dependence, so we have circular symmetry, implying m = 0. The Bessel 
function of order zero with this asymptotic dependence is Hi? (kr), as can be seen from 
Eq. (14.138). This boundary condition at infinity then determines our wave solution as 


U(r, t) = HO (ene. (14.147) 
This solution diverges as r — 0, which is the behavior to be expected with a source at the 
origin. a 
Exercises 
14.6.1. | Determine the asymptotic dependence of the modified Bessel functions J, (x), given 
1 dt 
eee (x/2)(t+1/t) 
= ai Je ptr 
The contour starts and ends at t = —oo, encircling the origin in a positive sense. There 
are two saddle points. Only the one at z = +1 contributes significantly to the asymptotic 
form. 
14.6.2 Determine the asymptotic dependence of the modified Bessel function of the second 
kind, K,(x), by using 
1 i d 
= s 
Keyed f comerin 
0 
14.6.3. Verify that the integral representations 
1 1s 
In(z) = — / eS! cos(nt)dt, 
0 
lo} 
Ky) = fe cosn(vnyar, Ne(z) > 0, 
0 
satisfy the modified Bessel equation by direct substitution into that equation. How can 
you check the normalization? 
14.6.4 (a) Show that when K, is defined by Eq. (14.128), 


dKo(z) a 
a K\(z). 
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14.6.5 
14.6.6 


14.6.7 


14.6.8 


14.6.9 


14.6.10 


t plane 











FIGURE 14.14 Modified Bessel function contours. 


(b) Show that the indefinite integral of —K (x) as defined by Eq. (14.128) has in 
the limit of small z the value —Inz + C, and therefore, by comparison with 
Eq. (14.110), that Ko as defined by Eq. (14.128) has the correct normalization. 


Verify that Eq. (14.132) can be rearranged to the form given as Eq. (14.133). 
(a) Show that 


yaad’ fer ai 


satisfies the modified Bessel equation, provided the contour is chosen so that 
en (4? _ yer 
has the same value at the initial and final points of the contour. 
(b) Verify that the contours shown in Fig. 14.14 are suitable for this problem. 
Use the asymptotic expansions to verify the following Wronskian formulas: 
(a) Jy (x) J—p-1(x) + Jey (4) Jpg (x) = —2 sin va /1x, 
(b) Jy) Nv4 10) — v4 1) Ny (x) = —2/tx, 
©) A@)HO,@) — h-1G) A @) =2/ixx, 
(d) Lx) Ki (x) — T,() Kv @) = -1/x, 
(©) W(x) Kyi) + huis) Ky (x) = 1/x. 


Verify that the Green’s function for the 2-D Helmholtz equation (operator V7 + k?) 
with outgoing-wave boundary conditions is 


tod 
G(p 1, Po) = zo (k\p, — pp). 
Hint. HU” (kp) is known to be an outgoing-wave solution to the homogeneous 


Helmholtz equation. 


From the asymptotic form of K,(z), Eq. (14.134), derive the asymptotic form of 
H(z), Eq. (14.138). Note particularly the phase, (v + 5) /2. 


Apply Stokes’ method for obtaining an asymptotic expansion for the Hankel function 
HOD as follows: 





14.6.11 


14.6.12 


14.6.13 


14.6 Asymptotic Expansions 697 


(a) Replace the Bessel function in Bessel’s equation by x~!/? y(x) and show that y(x) 


satisfies 





” v= = 
y (1 — )oo=0 


(b) Develop a power-series solution with negative powers of x starting with the 
assumed form 
[o,@) 


y(x) =e ax”. 


n=0 


Obtain the recurrence relation giving a,+; in terms of a,. Check your result 
against the asymptotic series, Eq. (14.138). 


(c) From Eq. (14.125), determine the initial coefficient, ao. 


Using the method of steepest descents, evaluate the second Hankel function given by 


1 dz 
(2) = (t/2)(z—-1/z)_“*_ 
H} m=— fe aT? 


ANS. H® (t) ~ | Sg ttas ne) 
a 


(a) In applying the method of steepest descents to the Hankel function HY (t), show 
that w(z, t), which appears in Eq. (14.121), satisfies 


Rel w(z, t)] < Re[w(zo, t)] =0 


for z on the contour C, (Fig. 14.10) but away from the point z = zp =i. 


C2 


with contour C2 as shown in Fig. 14.10. 


(b) For general values of z = re!®, show that 


ee ee 
Nelw(z,t)]>0 for O<r<1l, 2 1 
—m<0< = 
2 
and 
IT 4 
Ne[lw(z,t)]<O for r>1, oe 


Your demonstration verifies that the distribution of the sign of w is as shown 
schematically in Fig. 14.15. 

(c) Explain why the contour C; (Fig. 14.10) cannot be deformed to go through both 
saddle points, and why it may not go through the saddle point at —i if it is to end 
at z = —oo with argument +7. 


Calculate the first 15 partial sums of Po(x) and Qo(x), Eqs. (14.135) and (14.136). 
Let x vary from 4 to 10 in unit steps. Determine the number of terms to be retained 
for maximum accuracy and the accuracy achieved as a function of x. Specifically, how 
small may x be without raising the error above 3 x 10~°? 


ANS. Xmin = 6. 
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FIGURE 14.15 Sign of w(z, ft), occurring in Eq. (14.121), for integral representation of 
Hankel functions. 


SPHERICAL BESSEL FUNCTIONS 


In Section 9.4 we discussed the separation of the Helmholtz equation in spherical coordi- 
nates. We showed there that in the oft-occurring case that the boundary conditions of the 
problem have spherical symmetry, the radial equation has the form given in Eq. (9.80), 
namely, 
dR dR 
2 2.2 

7 + ra +[k r*—1(+1)|R=0. (14.148) 
We remind the reader that the parameter k is that from the original Helmholtz equation, 
while /(/ + 1) is the separation constant associated with solutions of the angular equations 
identified by the index / (which is required by the boundary conditions to be an integer). 

In Section 9.4 we went on to discuss the fact that the substitution 





Ren (14.149) 
r) = —_~ : 
(kr)1/2 
permits us to rewrite Eq. (14.148) as 
ae FF ied hee : Z=0 (14.150) 
r r r _— _ a , . 
dr? dr 2 


which we identified in Eq. (9.84) as Bessel’s equation of order / + 5 

We can now identify the general solution Z(kr) as a linear combination of Jj+1/2(kr) 
and Y)41/2(kr), which in turn means that we can write R(kr) in terms of these Bessel 
functions of half-integral order, illustrated (for Ji+1/2) by 


Cc 
Rkr) = Teg tr). 
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Since the R(kr) describe radial functions in spherical coordinates, they are termed spheri- 
cal Bessel functions. Note also that since Eq. (14.148) is homogeneous, we are free to 
define our spherical Bessel functions at any scale; the scale ordinarily used is that intro- 
duced in the next subsection. 


Definitions 
We define our spherical Bessel functions by the following equations. It is not ordinarily 


useful to introduce spherical Bessel functions with indices that are not integers, so we 
assume the index n to be integral (but not necessarily nonnegative). 


: _ 7 
In) = 5 n+1/2(x), 
Yal®) =f 5 n+1/2(x), 











(14.151) 
a ; : 
I) =f 20) = In) + in), 
X 
h(x) = | —H 9(x) = jinx) — in (x). 
n oe n+1/2 
Referring to the definition of Y,41/2, we see that 
1 
cos(n + 5) In41/2(x) — J-n—-1/2(%) 
Yn41/2(x) _ 2: = / ; n / — ( I anu 1 (x), 
sin(n + 5) ma) 
which means that 
res Goel) adi ee a (14.152) 


These spherical Bessel functions (Figs. 14.16 and 14.17) can be expressed in series form. 
Using Eq. (14.6), we have initially 





CO 
; ia (-1)° xX \ 2s+n+1/2 
x)= : 14.153 
n(x) (2 ean 2) (14.153) 
Writing 
T(s+n+3)=P(n+3)(nt+ 3)s, (14.154) 


where (..); is a Pochhammer symbol, defined in Eq. (1.72), we can bring Eq. (14.153) to 
the form 
[o,@) 


: T x\nti/2 J Saas a ce 
inty= f= (3) rin + 3) 2 slin+ De () 


s=0 


xh oo (-1)5 x\ 2s 
=Gpepi > (5) (14.155) 


3 
& sin + 3)s 
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FIGURE 14.17 Spherical Neumann functions. 
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We reached the last line of Eq. (14.155) by writing "(mn + 3) using the double factorial 
notation (compare with Exercise 13.1.14). 

If we now develop a series expansion for y,(x) by the same method that was used for 
Jn(x), but starting from Eq. (14.152), we get 





Qn-—DN< 1 pyc 
Ya) =~ a (14.156) 


The spherical Bessel functions are oscillatory, as can be seen from the graphs in 
Figs. 14.16 and 14.17. Note that j,(«) are regular at x = 0, with limiting behavior there 
proportional to x”. The y, are all irregular at x = 0, approaching that point as x~"~!. 

The infinite series in Eqs. (14.155) and (14.156) can be evaluated in closed form (but 
with increasing difficulty as n increases). For the special case n = 0, we can substitute into 
Eq. (14.155) s! = 27~* (2s)! and (3/2), = 27° (2s + 1)!!, reaching 


. 7 0° (—1)*228 x 2s ia (-1)° 4, 
jo) =)0 (Qs)!(2s + D! (5) = Qs+D"" 





s=0 s=0 
sin x 
= —., (14.157) 
x 
A similar treatment of the expansion for yo yields 
cos x 
yo(x) = — . (14.158) 





From the definition of the spherical Hankel functions, Eq. (14.151), we also have 


(1) 1. F Eis 
ho OS Gui Pee ye : (14.159) 
1 — 
h(x) = —(sinx +icosx) = —e7®. (14.160) 
x x 


Since we anticipate the availability of recurrence formulas for the spherical Bessel func- 
tions, and since yo is just —j_,, we expect all the j, and y, to be linear combinations of 
sines and cosines. In fact, the recurrence formulas are good ways of getting these functions 
for small n. However, we identify here an alternate approach, which depends on the fact, 
noted in Section 14.6, that the asymptotic expansion for the Hankel functions actually ter- 
minates when the order is a half-integer, thereby yielding exact, closed expressions. We 


start from 
1 55) 
AY (x) = [5 Fn+12@) 


= (-i)""! —[ Pate) +i On412(x)], (14.161) 
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where P, and Q, are given by Eqs. (14.135) and (14.136). Now, Pp41/2 and Qn+1/2 are 
polynomials, and we can bring Eq. (14.161) to the form 





ix n si 
ye y iS (2n+2s)!! 
x 


by a= (Cd s!(8x)* (2n — 2s)! 








s=0 
_ pie ye _ ts)! 
= (“1 — 2 ane aaa (14.162) 


For real x, jn (x) is the real part of this, y, (x) the imaginary part, and A? (x) the complex 
conjugate. Specifically, 














: 1 i 
h(x) =e! (-; = =) (14.163) 
x x 
i (y= (=-4- =). (14.164) 
. sinx Ccosx 
AQ) = - , (14.165) 
x x 
. 3 1\., 3 
hRW= (3 sinx 5 COSX, (14.166) 
x x x 
cosx  sinx 
i= -— = (14.167) 
x x 
3 1 ae 
yo(x) = cos x sinx. (14.168) 
x3 x x2 


Recurrence Relations 


The recurrence relations to which we now turn provide a convenient way of developing 
the higher-order spherical Bessel functions. These recurrence relations may be derived 
from the power-series expansions, but it is easier to substitute into the known recurrence 
relations, Eqs. (14.8) and (14.9). This gives 


2n+1 
Fn—-100) + fg i) = an (x), (14.169) 
nfn—1(x) — (n+ 1) farsi) = Qnt 1) fi). (14.170) 
Rearranging these relations, or substituting into Eqs. (14.10) and (14.11), we obtain 
# fe! fa) =x" fy100, (14.171) 
dx 
d —n =n 
ae fu(x)] = —x™ fing i). (14.172) 


In these equations f, may represent jn, Yn, AD, or Ah, 
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By mathematical induction (Section 1.4) we may establish the Rayleigh formulas: 








; 1 d\" (sinx 
inls) = (-1y's" (2) (=). (14.173) 
x dx Xx 
ya(x) = —( prx"(- i ye (14.174) 
" x dx x J , 
ce _ o> non (; d ) (<) 
n (X)=—i(-1) x" | ~-— —], (14.175) 
x dx x 
AP) (x) = i(-1)"x" (=) (—). (14.176) 
x dx x 
Limiting Values 


For x < 1,’ Eqs. (14.155) and (14.156) yield 


n 


in(x) © TEEETIE (14.177) 
(2n — 1)! 
5) caer (14.178) 





The limiting values of the spherical Hankel functions for small x go as +iyy(x). 


The asymptotic values of jn, yn, AY, and A? may be obtained from the asymptotic 


forms of the corresponding Bessel functions, as given in Section 14.6. We find 








: 1, ni 
jn(x) ~ — sin (« = =), (14.179) 
x 2 
1 ni 
Ya(X) ~ ——cos (« — =), (14.180) 
x 2 
ix i(x—nz/2) 
nD x) ~ ( ye Sie ae (14.181) 
—ix =i(v—ni/2) 
AP (x) ~ itt - =X, (14.182) 


The condition for these spherical Bessel forms is that x >> n(n + 1)/2. From these asymp- 
totic values we see that j,(x) and y,(x) are appropriate for a description of standing 
spherical waves; A (x) and h @) (x) correspond to traveling spherical waves. If the time 
dependence for the traveling waves is taken to be e~!®’, then A (x) yields an outgoing 
traveling spherical wave, and A? (x) an incoming wave. Radiation theory in electromag- 
netism and scattering theory in quantum mechanics provide many applications. 


°The condition that the second term in the series be negligible compared to the first is actually x < 2[(2n + 2)(2n + 3)/ 
(n+ D]'/? for jn(x). 
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Orthogonality and Zeros 


We may take the orthogonality integral for the ordinary Bessel functions, Eqs. (11.49) and 
(11.50), 


Fp 2 


p p a 2 
/ Jy (ap) Jy (cg ») pdp= = [441 @rp)] Spq> 
0 
and rewrite it in terms of j,, to obtain 
a 
ae PY 5 We. 2 
in (anp—) in (ang —) r? dr = —[intt @np)] “Spa: (14.183) 


0 


Here an» is the p-th positive zero of jn. 

Note that in contrast to the formula for the orthogonality of the J, Eq. (14.183) has the 
weight factor r?, not r. This of course comes from the factors x~!/* in the definition of 
Jn(x), but also has the effect that if the integration is construed as being over a spherical 
volume rather than a linear interval, it is the factor corresponding to uniform weight of 
all volume elements. (Remember that the weight o for the J, integral produces uniform 
weight if we construe the integration in that case as over the area within a circle.) 

As for the ordinary Bessel functions, the functions that are orthogonal on (0, a) all sat- 
isfy a Dirichlet boundary condition, with zeros at r = a. We therefore find it useful to know 
the values of the zeros of the j,. The first few zeros for small n, and also the locations of 
the zeros of j/, are listed in Table 14.2. 

The following example illustrates a problem in which the zeros of the j, play an essential 
role. 


Table 14.2 Zeros of the Spherical Bessel Functions and Their First 














Derivatives 

Number 

of zero Jo(x) AG) jax) J3(%) J4(x) js(x) 

1 3.1416 4.4934 5.7635 6.9879 8.1826 9.3558 

2 6.2832 7.7253 9.0950 10.4171 11.7049 12.9665 

3 9.4248 10.9041 12.3229 13.6980 15.0397 16.3547 

4 12.5664 14.0662 15.5146 16.9236 18.3013 19.6532 

5 15.7080 17.2208 18.6890 20.1218 21.5254 22.9046 
nAGo) i @) i5(x) 5 (x) jy) iS) 
4.4934 2.0816 3.3421 4.5141 5.6467 6.7565 


7.7253 5.9404 7.2899 8.5838 9.8404 11.0702 
10.9041 9.2058 10.6139 11.9727 13.2956 14.5906 
14.0662 12.4044 13.8461 15.2445 16.6093 17.9472 
17.2208 15.5792 17.0429 18.4681 19.8624 21.2311 


Ak WN 
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Example 14.7.1 PARTICLE IN A SPHERE 


An illustration of the use of the spherical Bessel functions is provided by the problem of a 
quantum mechanical particle of mass m in a sphere of radius a. Quantum theory requires 
that the wave function w, describing our particle, satisfy the Schrédinger equation 


hz 
- SV = Ey, (14.184) 
m 


subject to the conditions that (1) w(r) is finite for all 0 <r <a, and (2) w(a) = 0. This 
corresponds to a square-well potential V = 0 for r <a, V =oo for r >a. Here Ai is 
Planck’s constant divided by 27. Equation (14.184) with its boundary conditions is an 
eigenvalue equation; its eigenvalues EF are the possible values of the particle’s energy. 

Let us determine the minimum value of the energy for which our wave equation has an 
acceptable solution. Equation (14.184) is Helmholtz’s equation, which after separation of 
variables leads to the radial equation previously presented as Eq. (14.148): 


d?R  2dR 1d+1 
+[e | R=0, 
r 


14.1 
dr? - r dr 2 ( rm 





with 
k* =2mE /h? (14.186) 


and / (determined from the angular equation) a nonnegative integer. Comparing with 
Eq. (14.150) and the definitions of the spherical Bessel functions, Eq. (14.151), we see 
that the general solution to Eq. (14.185) is 


R= Aj(kr) + By(kr). (14.187) 


To satisfy the boundary conditions of the present problem, we must reject the solution y; 
because it is singular at r = 0, and we must choose k such that j;(ka) = 0. This boundary 
condition at r = a can be satisfied if 


a ee (14.188) 
a 


where a; 1s the ith positive zero of j;. From Eq. (14.186) we see that the smallest £ 
will correspond to the smallest acceptable k, which in turn corresponds to the smallest o;;. 
Thus, scanning Table 14.2, we identify the smallest a; as the first zero of jo, a result which 
we would expect after we have learned that the value / = 0 is associated with an angular 
function with no kinetic energy. 

We conclude this example by solving Eq. (14.186) for E with k assigned the value 
ag, /a=x/a'®: 


mhz h2 


Ima2— 8ma2" 





(14.189) 


Emin = 


10Most of the entries in Table 14.2 are only accessible numerically, but the zeros of jg are readily identified due to their simple 
form, gn = mz. 
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This example illustrates several features common to bound-state problems in quantum 
mechanics. First, we see that for any finite sphere the particle will have a positive minimum 
or zero-point energy. Second, we note that the particle cannot have a continuous range of 
energy values; the energy is restricted to discrete values corresponding to the eigenvalues 
of the Schrédinger equation. Third, the possible energies in this spherically symmetric 
problem depend on /; as is evident from the table of zeros of j;, the minimum energy for a 
given / increases with /. Finally, note that the orthogonality of the j; under the conditions 
of this problem shows us that the eigenfunctions corresponding to the same / but different 
i are orthogonal (with the weight factor corresponding to spherical polar coordinates). Ml 


We close this subsection with the observation that, in addition to orthogonality with 
respect to the scaling (to bring zeros to a specified r value), the spherical Bessel functions 
also possess orthogonality with respect to the indices: 


CO 


/ Jm(*)jn(x)dx =0, mn, m,n>=0. (14.190) 


—oo 


The proof is left as Exercise 14.7.12. If m =n (compare Exercise 14.7.13), we have 


F 2 = wT 
[taco dx = 5° (14.191) 


The spherical Bessel functions will enter again in connection with spherical waves, but 
further consideration is postponed until the corresponding angular functions, the Legendre 
functions, have been more thoroughly discussed. 


Modifed Spherical Bessel Functions 
Problems involving the radial equation 

r Gz trg, ~ [er +d + DR =0, (14.192) 
which differs from Eq. (14.148) only in the sign of k*, also arise frequently in physics. The 


solutions to this equation are spherical Bessel functions with imaginary arguments, leading 
us to define modified spherical Bessel functions (Fig. 14.18) as follows: 


Ge [a Inv122), (14.193) 
2 

Ki] —— Raa ad. (14.194) 
TTX 


Note that the scale factor in the definition of k, differs from that of the other spherical 
Bessel functions. 
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FIGURE 14.18 Modified 





spherical Bessel functions. 


With the above definitions, these functions have the following recurrence relations: 


in—1(X) — ing 


Nin—1(x) + (n+ Ving 


kn—-1(%) — kn4 


2n+1. 








nky—1(x) + (a+ Dkn4 


The first few of these functions are 














; sinh x 
ig(x) = ; 
. coshx  sinhx 
i1 (x) _ = 2: > 
x 
; : 1 3 3 coshx 
i2(x) = sinhx Biles 7 
5 ae te x 





L(x) = in(X), 
x 
1(x) = (2n + Ii’, (x), (14.195) 
2 1 
i= Be, 
x 
L1(x) = —(2n + 1)k), (x). 
e* 
k(x) = : 
fil 1 
ki(x) =e * (- + => Is (14.196) 
x Xx 
fl 3 3 
ko(x) =e (-+3+5). 
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Limiting values of the modified spherical Bessel functions are, for small x, 


x” (2n — 1)! 





in(x) © Qn+Di’ ky (x) © a (14.197) 
For large z, the asymptotic behavior of these functions is 
ex e* 
inQX)~ =, kn) ~ (14.198) 
2x 


Example 14. fiz PARTICLE IN A FINITE SPHERICAL WELL 


As a final example, we return to the problem of a particle trapped in a spherical potential 
well of radius a (Example 14.7.1), but instead of confining the particle by a wall at potential 
V = oo (equivalent to requiring that its wave function y vanish at r = a), we now consider 
a well of finite depth, corresponding to 


Vo<0, O<r<a, 
ee i a - 


If the particle can have an energy E < 0, it will be localized in and near the potential well, 
with a wave function that decays to zero as r increases to values greater than a. A simple 
case of this problem was one of our examples of an eigenvalue problem (Example 8.3.3), 
but in that case we did not proceed with enough generality to identify its solutions as Bessel 
functions. 

This problem is governed by the Schrédinger equation, which now has the form 


hz 
-— Vb + Vr) = Ey. 
2m 


This is an eigenvalue equation, to be solved for w and E over the full three-dimensional 
space, subject to the condition that w be continuous and differentiable for all r, and that it 
be normalizable (thus approaching zero asymptotically at large r). Here m is the mass of 
the particle and fi is Planck’s constant divided by 27. 

While this problem is more difficult than that of Example 14.7.1, it becomes manageable 
if we realize that it is equivalent to two separate problems for the respective regions 0 < 
r <aandr >a, within each of which the potential has a constant value, but constrained to 
(1) have the same eigenvalue EF, and (2) connect smoothly (so the r derivative will exist) 
atr=a. 

When our Schrédinger equation is processed by the method of separation of variables, 
we obtain as its radial component 





d*R  2dR 2m Id +1) 
E=V R=0, 
dr2 * r dr ( he [ )] r2 


which is either the spherical Bessel equation, Eq. (14.150), or the modified spherical Bessel 
equation, Eq. (14.192), depending on the sign of E — V(r). We see that if Vo < E <0, then 
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for r < a we will have E — V(r) > 0, yielding a Bessel ODE with an acceptable solution 

involving jj, while for r > a we have E — V(r) < 0, leading to a modified Bessel ODE 

for which we can choose the k; solution to obtain the necessary asymptotic behavior. 
Summarizing the above, we have, for the two regions: 


2m 
Rin(r) = Ajr(kr), k= —5(E— Vo) r<a, 


2m 


Rout(r) = Bk (k'r), k? = aE r>da. 


Smooth connection at r = a then corresponds to the equations 








h@e=kuw. == Aaa see, (14.199) 
R; dR 
GE): .2GRO) g Hala aD: (14.200) 
dr r=a dr r=a 


For / = 0 this problem reduces to that considered in Example 8.3.3, where we indicate 
a numerical procedure of solving it, but we are now in a position to obtain solutions for 
all /. a 


Exercises 


14.7.1. | Show how one can obtain Eq. (14.162) starting from Eq. (14.161). 


[x 

Ya(x) = Fy tatt/2), 
Iv 

(-1"t" | Fy Pon /2). 


14.7.3 Derive the trigonometric-polynomial forms of j,(z) and yy(z)!!: 


14.7.2 Show that if 


it automatically equals 


[n/2] 
| nn (—1)5(n +25)! 
nos an (2 2 ) pz (2s)\(2z)25(n — 2s)! 


s=0 





1 any LPI (15 +. 25 +1)! 
+ —cos (< = , 
7 2) 44 Qs+1!IQ2)*(—2s— D! 





The upper summation limit [7/2] means the largest integer that does not exceed n/2. 
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— (yt nay "ZI (1)5(n +25)! 
Yn(Z) = : cos (< + 5 ) dX (syn — 2s)! 


(-1*! any bil 
+ sin (z + =) 
Zz 2 


(-1)5(n + 2s +1)! 
(2s + 1)!(2z)25+1(n — 25 — 1)!" 








14.7.4 Use the integral representation of J, (x), 
1 


1 x\v . 

(x)= ~) i tixp(y _ 2 v-1/2q ; 

v(0) ae (1 = p?)’""dp 
-1 





fan) 


to show that the spherical Bessel functions j,,(x) are expressible in terms of trigono- 
metric functions; that is, for example, 





: sin.x . sinx COsx 
jo(x) = —_, ja) == - : 
X Xx xX 
14.7.5 (a) Derive the recurrence relations 
2n+1 





fn-1) + fn41(@) = —— fa), 


nfn—1(x) — (n+ 1) fr4i Qe) = 2nt I) fi) 
satisfied by the spherical Bessel functions j,(x), yn(x), AY (x), and A? (x). 





(b) Show, from these two recurrence relations, that the spherical Bessel function 
Jn() satisfies the differential equation 


x? f(x) + Ix fi (x) + [x7 — n(n + D] fnx) =0. 
14.7.6 | Prove by mathematical induction (Section 1.4) that 


; pf bad V7 six 
In(x) = (-1)"x (=<) (=) 
x dx x 


for n, an arbitrary nonnegative integer. 


14.7.7. From the discussion of orthogonality of the spherical Bessel functions, show that a 
Wronskian relation for j, (x) and n,(x) is 


1 
In OY) ~ Jn COYn@) = =. 


14.7.8 Verify 


/ / 2i 
AD (xe)h® (x) — AD (x)AM (x) = Sr 


14.7.9 Verify Poisson’s integral representation of the spherical Bessel function, 


TU 


[cose cos@) sin2"+! 6 dé. 
0 


n 


: Zz 
In(Z) = ant Ip! 
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14.7.10 A well-known integral representation for K,,(x) has the form 





[o,@) 
K,(x) = 2°T(v+ 5) cos xt ay 
H JSIEX” (12 + 1)Pt!/2 . 
0 
Starting from this formula, show that 


CO 
2"42(n +1)! k? jo(kx) 
rxntl (k2 + 1)nt2 . 
0 





Kn(x) = 


CO 
d 2 si — 2 
14.7.11 Show that f Jj (x)Jo(x) —_ — ah i: u+tv>0. 
x pe—v 
0 





CO 
14.7.12 Derive Eq. (14.190): / jim (©) jn(x)dx = 0, hea 0 
—co 





[o,@) 
14.7.13 Derive Eq. (14.191) J Linco]?a us 
nls rl . (14. : = : 
erive Eq. ( In(x x al 
—0o 


14.7.14 The Fresnel integrals (Fig. 14.19 and Exercise 12.7.2) occurring in diffraction theory 


are given by 
t 
= 7c [i = [ cose? 
x(t) 5 5 ; 
0 
t 
1 ea Sn ed 
yo |%s (/%) = [ sinc )dv. 
0 


Show that these integrals may be expanded in series of spherical Bessel functions as 
follows: 


Ss 
1 Co 
x(s)= 5 f vena Sy Tin 
0 n=0 
Ss 


1 — 
ys) = 5 / jo(uyu/?xdu=s"?'S° jonyi(s). 


0 n=0 


Hint. To establish the equality of the integral and the sum, you may wish to work 
with their derivatives. The spherical Bessel analogs of Eqs. (14.8) and (14.12) may 
be helpful. 
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14.7.15 


14.7.16 


14.7.17 











FiGURE 14.19 Fresnel integrals. 


A hollow sphere of radius a (Helmholtz resonator) contains standing sound waves. 
Find the minimum frequency of oscillation in terms of the radius a and the velocity of 
sound v. The sound waves satisfy the wave equation 


1 a? 
yy == 
¥ v2 at? 
_. oy 
and the boundary condition — =0, r=a. 


2 
The spatial part of this PDE is the same as the PDE discussed in Example 14.7.1, but 
here we have a Neumann boundary condition, in contrast to the Dirichlet boundary 
condition of that example. 


ANS. Vin = 9.3313v/a,  Amax = 3.018. 


(a) Show that the parity of i, (x) (the behavior under x + —x) is (—1)”. 
(b) Show that k, (x) has no definite parity. 


Show that the Wronskian of the spherical modified Bessel functions is given by 


1 
in (x)ky (x) — i, kn (X) = ey 
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CHAPTER 15 


LEGENDRE FUNCTIONS 


Legendre functions are important in physics because they arise when the Laplace or 
Helmholtz equations (or their generalizations) for central force problems are separated 
in spherical coordinates. They therefore appear in the descriptions of wave functions for 
atoms, in a variety of electrostatics problems, and in many other contexts. In addition, 
the Legendre polynomials provide a convenient set of functions that is orthogonal (with 
unit weight) on the interval (—1, +1) that is the range of the sine and cosine functions. And 
from a pedagogical viewpoint, they provide a set of functions that are easy to work with and 
form an excellent illustration of the general properties of orthogonal polynomials. Several 
of these properties were discussed in a general way in Chapter 12. We collect here those 
results, expanding them with additional material that is of great utility and importance. 

As indicated above, Legendre functions are encountered when an equation written in 
spherical polar coordinates (r, 6, g), such as 


—Vb +t V(b =aAy, 


is solved by the method of separation of variables. Note that we are assuming that this 
equation is to be solved for a spherically symmetric region and that V(r) is a func- 
tion of the distance from the origin of the coordinate system (and therefore not a func- 
tion of the three-component position vector r). As in Eqs. (9.77) and (9.78), we write 
w = R(r)O(@)®(g) and decompose our original partial differential equation (PDE) into 
the three one-dimensional ordinary differential equations (ODEs): 





a =—m°, (15.1) 

a r (sino a) ne +1U+ IO =0, (15.2) 

a =) +[2-vin]a- "<0. (15.3) 
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Chapter 15 Legendre Functions 


The quantities m? and /(/ + 1) are constants that occur when the variables are separated; 
the ODE in ¢ 1s easy to solve and has natural boundary conditions (cf. Section 9.4), which 
dictate that m must be an integer and that the functions ® can be written as e*'”"? or as 
sin(m@), cos(m@). 

The © equation can now be transformed by the substitution x = cos6, cf. Eq. (9.79), 
reaching 





m2 


— P(x) +10 + 1)P(x) =0. (15.4) 





(1 — x2) P” (x) — 2x P’(x) — ; 
This is the associated Legendre equation; the special case with m = 0, which we will 
treat first, is the Legendre ODE. 


LEGENDRE POLYNOMIALS 
The Legendre equation, 
(1 — x?) P(x) — 2x P(x) +AP(x) =0, (15.5) 


has regular singular points at x = +1 and x = oo (see Table 7.1), and therefore has a series 
solution about x = 0 that has a unit radius of convergence, i.e., the series solution will 
(for all values of the parameter 4) converge for |x| < 1. In Section 8.3 we found that for 
most values of A, the series solutions will diverge at x = +1 (corresponding to 0 = 0 and 
@ =z), making the solutions inappropriate for use in central force problems. However, if 
i has the value /(/ + 1), with / an integer, the series become truncated after x!, leaving a 
polynomial of degree /. 

Now that we have identified the desired solutions to the Legendre equations as polyno- 
mials of successive degrees, called Legendre polynomials and designated P;, let us use 
the machinery of Chapter 12 to develop them from a generating-function approach. This 
course of action will set a scale for the P; and provide a good starting point for deriving 
recurrence relations and related formulas. 

We found in Example 12.1.3 that the generating function for the polynomial solutions 
of the Legendre ODE is given by Eq. (12.27): 


[ee 


1 


FS TD Pa. (15.6) 
=a n=0 


g(x, t= 


To identify the scale that is given to P,, by Eq. (15.6), we simply set x = | in that equation, 
bringing its left-hand side to the form 


gU,t= 





: ae =r (15.7) 
V1-2tr+02 1-t os 


where the last step in Eq. (15.7) was to expand 1/(1 — ¢) using the binomial theorem. 
Comparing with Eq. (15.6), we see that the scaling it predicts is P,(1) = 1. 
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Next, consider what happens if we replace x by —x and t by —f. The value of g(x, f) in 
Eq. (15.6) is unaffected by this substitution, but the right-hand side takes a different form: 


S> Pa(x)t" = g(x, t) = g(—x, -t) = > Py(—x)(-0)"" (15.8) 


n=0 n=0 


showing that 
Py(—x) = (-1)" P(X). (15.9) 


From this result it is obvious that P,(—1) = (—1)”, and that P,,(x) will have the same 
parity as x”. 

Another useful special value is P,,(0). Writing P2, and P2,41 to distinguish even and 
odd index values, we note first that because P2,+1 is odd under parity, ie., x > —x, we 
must have P2,4 (0) = 0. To obtain P2,,(0), we again resort to the binomial expansion: 


£0, = 22 / a ) P= Do Par (15.10) 


n=0 n=0 
Then, using Eq. (1.74) to evaluate the binomial coefficient, we get 


(2n — 1)! 


Pon (0) = (—1)” ~ Qanyil : 


(15.11) 


It is also useful to characterize the leading terms of the Legendre polynomials. Applying 
the binomial theorem to the generating function, 


7 oe) 12 
— 2y—1/2 _ nee 2\n 
(l-2er +P) 1? =Y ( : \ Qxt +12)", (15.12) 


n=0 


from which we see that the maximum power of x that can multiply t” will be x”, and is 
obtained from the term (—2xt)” in the expansion of the final factor. Thus, the 


it Shines (Qn — 1)! 





(15.13) 


coefficient of x” in P,(x) is ( 
n n! 


These results are important, so we summarize: 


P(x) has sign and scaling such that P,(1) = 1 and P,(—1) = (-1)". 

Pon (x) is an even function of x; P2y+1 (x) is odd. Pan+41(0) =0, and Pr, (0) 

is given by Eq. (15.11). Py(x) is a polynomial of degree n in x, with the 

coefficient of x" given by Eq. (15.13); Py(x) contains alternate powers of 

xe xl, x82... (9 or x!), 
From the fact that P, is of degree n with alternate powers, it is clear that Po(x) = 
constant and that P; (x) = (constant) x. From the scaling requirements these must reduce to 
Po(x) =1 and Pi (x) =x. 

Returning to Eq. (15.12), we can get explicit closed expressions for the Legendre poly- 

nomials. All we need to do is expand the quantity (—2xr + 17)” and rearrange the summa- 
tions to identify the x dependence associated with each power of t. The result, which is in 
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general less useful than the recurrence formulas to be developed in the next subsection, is 


[n/2] 


‘a 2n — 2k)! n—2k 
pe Dal) ra ess (15.14) 





Here [n/2] stands for the largest integer < n/2. This formula is consistent with the 
requirement that for n even, P,(x) has only even powers of x and even parity, while 
for n odd, it has only odd powers of x and odd parity. Proof of Eq. (15.14) is the topic of 
Exercise 15.1.2. 


Recurrence Formulas 


From the generating function equation we can generate recurrence formulas by differenti- 
ating g(x, t) with respect to x or t. We start from 





dg(x, t) x—t = 4 
ge Cader Pye Lyn Pa (15.15) 
n=0 


which we rearrange to 


(=o) Sarco + (t—x) ~~ P,,(x)t" =0, (15.16) 


n=0 n=0 


and then expand, reaching 


CO CO Co 
x nP,(x)t" | —2 nx Py(x)t" + Ss nP,(x)t"*! 
n=0 n=0 n=0 
foe) CO 
+0 Pr(xye"t! — Sox Py(x)t" =0. (15.17) 


n=0 n=0 


Collecting the coefficients of t” from the various terms and setting the result to zero, 
Eq. (15.17) is seen to be equivalent to 


(Qn + Ix Py(x) = (n+ DL Pr4i(Qx4) t+nPh-i(Qx), n=1, 2,3,.... (15.18) 


Equation (15.18) permits us to generate successive P, from the starting values Po and P 
that we have previously identified. For example, 


1 
2P2(x) =3xPi(x)— Pox) —> Pax) =5 (3x? = 1). (15.19) 
Continuing this process, we can build the list of Legendre peyauutials given in Table 15.1. 


We can also obtain a recurrence formula involving P’ by differentiating g(x, 1) with 
respect to x. This gives 


dg(x, t) 
ax. —2xt +P -> Pr (xe, 
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Table 15.1 Legendre Polynomials 





Po(x) = 1 

P(x) =x 

Pp (x) = 5 (3x? — 1) 

P3(x) = 5 (5x3 — 3x) 

P4(x) = g (35x4 — 30x? + 3) 

P5(x) = g (63x — 70x3 + 15x) 

Po (x) = qk (231x® — 315x4 + 105x? — 5) 

Py(x) = qk (429x7 — 693x> + 315x3 — 35x) 

Pg(x) = qog (6435x8 — 12012x° + 6930x4 — 1260x? + 35) 


























or 
[o,@) [o,@) 
(l= 2x0 +27) S0 Pix)" — 1 9° Pr(x)t" =0. (15.20) 
n=0 n=0 
As before, the coefficient of each power of f is set to zero and we obtain 
Phi (%) + Py_y (x) = 2x P(x) + Pr(x). (15.21) 


A more useful relation may be found by differentiating Eq. (15.18) with respect to x and 
multiplying by 2. To this we add (2n + 1) times Eq. (15.21), canceling the P; term. The 


result is 

Pi y(x) — P)_j() = (n+ 1) Pye). (15.22) 
Starting from Eqs. (15.21) and (15.22), numerous additional relations can be developed, ! 

including 
Pg) = (0 + 1) Pax) + xP, (x), (15.23) 
P)_y (x) = —n Py (x) +x Pi (x), (15.24) 
(1 — x?) Pi (x) =nPy_1(x) — nx Pn(x), (15.25) 
(1 =x?) Ph(x) = (n + 1)x Pax) — (2 + 1) Pri). (15.26) 


Because we derived the generating function g(x,t) from the Legendre ODE and then 
obtained the recurrence formulas using g(x, ft), that ODE will automatically be consis- 
tent with these recurrence relations. It is nevertheless of interest to verify this consistency, 
because then we can conclude that any set of functions satisfying the recurrence formulas 
will be a set of solutions to the Legendre ODE, and that observation will be relevant to 





1Using the equation numbers in parentheses to indicate how they are to be combined, we may obtain some of these derivative 


formulas as follows: 
2: # (15.18) + (2n + 1)- (15.21) > (15.22), 4 {(15.21) + (15.22)} = (15.23), 


4 {(15.21) — (15.22)} => (15.24), (15.23) n—sn—1 +x (15.24) = (15.25). 
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the Legendre functions of the second kind (solutions linearly independent of the polyno- 
mials P;). A demonstration that functions satisfying the recurrence formulas also satisfy 
the Legendre ODE is the topic of Exercise 15.1.1. 


Upper and Lower Bounds for P,, (cos 6) 
Our generating function can be used to set an upper limit on | P, (cos @)|. We have 
(l= treat hey? 2 are") See 


| io , 3.2210 1 -ie , 3,2,-210 
= (14 516 eee +) (14 pe Gre + ) (15.27) 
We may make two immediate observations from Eq. (15.27). First, when any term within 
the first set of parentheses is multiplied by any term from the second set of parentheses, 
the power of f in the product will be even if and only if m in the net exponential e!”° is 
even. Second, for every term of the form t”e!’”®, there will be another term of the form 
tei, and the two terms will occur with the same coefficient, which must be positive 
(since all the terms in both summations are individually positive). These two observations 
mean that: 


(1) Taking the terms of the expansion two at a time, we can write the coefficient of ¢” as 
a linear combination of forms 


1 : : 
2 anm Cas = em) = ym Cosmo 
with all the ay, positive, and 
(2) The parity of n and m must be the same (either they are both even, or both odd). 


This, in turn, means that 
n 
P,,(cos@) = >} Anm COSmé. (15.28) 
m=0 or 1 

This expression is clearly a maximum when 0 = 0, where we already know, from the Sum- 
mary following Eq. (15.11), that P,(1) = 1. Thus, 

The Legendre polynomial P,(x) has a global maximum on the inter- 

val (—1,+1) at x = 1, with value P,(1) = 1, and if n is even, also at 


x =-—1. Ifn is odd, x = —1 will be a global minimum on this interval 
with P,(—1) = -1. 


The maxima and minima of the Legendre polynomials can be seen from the graphs of 
P2 through Ps, in which are plotted in Fig. 15.1. 


Rodrigues Formula 


In Section 12.1 we showed that orthogonal polynomials could be described by 
Rodrigues formulas, and that the repeated differentiations occurring therein were good 
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FiGURE 15.1 Legendre polynomials P(x) through P5(x). 


starting points for developing properties of these functions. Applying Eq. (12.9), we find 
that the Rodrigues formula for the Legendre polynomials must be proportional to 


(=) daxy, (15.29) 
Xx 


Equation (12.9) is not sufficient to set the scale of the orthogonal polynomials, and to 
bring Eq. (15.29) to the scaling already adopted via Eq. (15.6) we multiply Eq. (15.29) by 
(—1)"/2" n!, so 





1 d " 2 n 
Pal) = 5 (<) (x? — 1)". (15.30) 


To establish that Eq. (15.30) has a scaling in agreement with our earlier analyses, it 
suffices to check the coefficient of a single power of x; we choose x”. From the Rodrigues 





formula, this power of x can only arise from the term x?” in the expansion of (x? — 1)”, 
and the 
. - 2 (2n)!_ Qn—1)!! 
coefficient of x” in P,(x) (Rodrigues) is — ; 
2° n! n! n! 


in agreement with Eq. (15.13). This confirms the scale of Eq. (15.30). 
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Exercises 


15.1.1 


15.1.2 


15.1.3 


15.1.4 


15.1.5 


Derive the Legendre ODE by manipulation of the Legendre polynomial recurrence 
relations. Suggested starting point: Eqs. (15.24) and (15.25). 


Derive the following closed formula for the Legendre polynomials P,, (x). 


[n/2] 
_ k (2n — 2k)! n—2k 
Pate) = Da ) 2k(n—b!n—2e! ; 





where [1/2] stands for the integer part of n/2. 
Hint. Further expand Eq. (15.12) and rearrange the resulting double sum. 


By differentiation and direct substitution of the series form given in Exercise 15.1.2, 
show that P,, (x) satisfies the Legendre ODE. Note that there is no restriction on x. We 
may have any x, —oo <x < oo, and indeed any z in the entire finite complex plane. 


The shifted Legendre polynomials, designated by the symbol P*(x) (where the as- 
terisk does not mean complex conjugate) are orthogonal with unit weight on [0, 1], 
with normalization integral (P*|P*) = 1/(2n+ 1). The P* through n = 6 are shown in 
Table 15.2. 


(a) Find the recurrence relation satisfied by the P*. 
(b) Show that all the coefficients of the Pf are integers. 


Hint. Look at the closed formula in Exercise 15.1.2. 
Given the series 
ago +ar cos? 6 + a4 cost @ + a6 cos°@ = ag Po + a2 P2 +.a4P4 + a6 Po, 
where the arguments of the P,, are cos 6, express the coefficients a; as a column vector 
a and the coefficients a; as a column vector a and determine the matrices A and B such 


that 


Aaw=a and Ba=a. 


Table 15.2 Shifted Legendre Polynomials 





Past 

PY (x)= 2x-1 

P(x) = 6x? — 6x +1 

PX (x) = 20x? — 30x? + 12x -1 

P¥ (x) = 70x4 — 140x3 + 90x? — 20x + 1 

P2 (x) = 252x> — 630x* + 560x3 — 210x? + 30x — 1 

Pé (x) = 924x® — 2772x9 + 3150x4 — 1680x3 + 420x? — 42x +1 























15.1.6 


15.1.7 


15.1.8 


15.1.9 


15.1.10 


15.1.11 


15.1.12 


15.1.13 
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Check your computation by showing that AB = 1 (unit matrix). Repeat for the odd case 
a, cos0 + a3 cos*@ +05 cos” @ + az cos’6 = a, P; + a3 P3 +a5Ps5 + a7P7. 


Note. P,(cos@) and cos” @ are tabulated in terms of each other in AMS-55 (see Addi- 
tional Readings for the complete reference). 


By differentiating the generating function g(x, t) with respect to t, multiplying by 2r, 
and then adding g(x, t), show that 


1-7 


[o,@) 
Ca aex pA = Ln t Pace”. 


n=0 


This result is useful in calculating the charge induced on a grounded metal sphere by a 
nearby point charge. 


(a) Derive Eq. (15.26), 
(1 —x*) Pi (x) = (2 + Ix Pr(x) — (2 + I) Papi @). 


(b) Write out the relation of Eq. (15.26) to preceding equations in symbolic form 
analogous to the symbolic forms for Eqs. (15.22) to (15.25). 


Prove that 
P, (y= £ px) ly=1= eee +1). 
dx 2 
Show that P,,(cos@) = (—1)” P, (— cos @) by use of the recurrence relation relating P,,, 
Py+1, and P,—; and your knowledge of Po and Py. 


From Eq. (15.27) write out the coefficient of t? in terms of cosn0, n < 2. This coeffi- 
cient is P2(cos@). 


Derive the recurrence relation 
(1 — x?) Pl (x) =n Pa_1(x) — nx Py (x) 
from the Legendre polynomial generating function. 
1 
Evaluate / Py (x) dx. 
0 
ANS. n=2s, 1 fors = 0, 0 for s > 0; 
n=2s +1,  Pos5(0)/(2s + 2) = (—1)' (2s — 1)!!/1(@2s + 2)! 


Hint. Use a recurrence relation to replace P,,(x) by derivatives and then integrate by 
inspection. Alternatively, you can integrate the generating function. 


Show that each term in the summation 


3 ( d ‘i (—1)'n! 2n—2r 
— |} ——x 
tad dx ri(n—r)! 


vanishes (r and n integral). Here [n/2] is the largest integer < n/2. 
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15.1.14 


15.1.15 


15.1.16 


15.1.17 


15.1.18 


Show that vier x™ P,(x) dx =0 when m <n. 
Hint. Use Rodrigues formula or expand x” in Legendre polynomials. 
Show that 


1 


je Py (x)dx = 


=I 


2n! 
(Q2n+ 1)! 


Note. You are expected to use the Rodrigues formula and integrate by parts, but also 
see if you can get the result from Eq. (15.14) by inspection. 


Show that 


1 
g2n+1 or)! ! 
[7 PnGoar = Gres”) , ren. 
(2r+2n+1)!(r—n)! 





—1 


As a generalization of Exercises 15.1.15 and 15.1.16, show that the Legendre expan- 

sions of x* are 

eas 3 2" (4n + 1)(2r)!(r +n)! 
(2r+2n+1)!(r—n)! 





Pon (x), s=2r, 
n=0 


r 


&) Haye 27+ (4n + 3)(2r+1)\(r+n+ 1)! 
= (r+ 2n+3)!(r —n)! 





Ponzi(x), $= 2r+l1. 
n=0 


In numerical work (for e.g., the Gauss-Legendre quadrature), it is useful to establish 
that P,,(x) has n real zeros in the interior of [—1, 1]. Show that this is so. 


Hint. Rolle’s theorem shows that the first derivative of (x2 — 1)?” has one zero in the 
interior of [—1, 1]. Extend this argument to the second, third, and ultimately the nth 
derivative. 


15.2 ORTHOGONALITY 


Because the Legendre ODE is self-adjoint and the coefficient of P(x), namely (1 — x7), 
vanishes at x = +1, its solutions of different n will automatically be orthogonal with unit 
weight on the interval (—1, 1), 





1 


[ Poconcnas =0, (n Am). (15.31) 


—l 


Because the P, are real, no complex conjugation needs to be indicated in the orthogonality 


integral. Since P,, is often used with argument cos 6, we note that Eq. (15.31) is equivalent 
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[ Pa(c00) Pn (cose) sindd@=0, (n#™m). (15.32) 
0 


The definition of the P,, does not guarantee that they are normalized, and in fact they 
are not. One way to establish the normalization starts by squaring the generating-function 
formula, yielding initially 


Be 2 
=26¢7)-' 2 bs Py co (15.33) 


Integrating from x = —1 to x = | and dropping the cross terms because they vanish due to 
orthogonality, Eq. (15.31), we have 


1 


| mare He { [ran] ae (15.34) 


= =1 


Making now the substitution y = 1 — 2tx + 17, with dy = —2t dx, we obtain 





d 1 Oe 1 1 
x y +t 
= =-|1 ; 15.35 
estes. 2t / y t o(~) ( ) 
=] (1-1)? 
Expanding this result in a power series (Exercise 1.6.1), 
1, f/1t+t =e 
- Inj ——])=2) ~—_., 15.36 
t n(**) do nti eee 


and equating the coefficients of powers of t in Eqs. (15.34) and (15.36), we must have 


1 


2 2 
[ [ro] dx = (15.37) 
=i 





Combining Eqs. (15.31) and (15.37), we have the orthonormality condition 


1 


i Py (x) Pm(x)dx = am (15.38) 
2n+1 





—1 


This result can also be obtained using the Rodrigues formulas for P, and P,,. See Exer- 
cise 15.2.1. 
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Legendre Series 


The orthogonality of the Legendre polynomials makes it natural to use them as a basis for 
expansions. Given a function f(x) defined on the range (—1, 1), the coefficients in the 
expansion 


[o.@) 
fO=)>° an Pa@) (15.39) 
n=0 
are given by the formula 
1 
2n+1 
ar / f(x) Pa(x)dx. (15.40) 
-1 


The orthogonality property guarantees that this expansion is unique. Since we can (but 
perhaps will not wish to) convert our expansion into a power series by inserting the expan- 
sion of Eq. (15.14) and collecting the coefficients of each power of x, we can also obtain a 
power series, which we thereby know must be unique. 

An important application of Legendre series is to solutions of the Laplace equation. We 
saw in Section 9.4 that when the Laplace equation is separated in spherical polar coordi- 
nates, its general solution (for spherical symmetry) takes the form 


wr, 8,9) = Y\(Amr! + Bimr'') Pi" (cos 8) (Aj, Sime + Bi, cosmg), — (15.41) 


I,m 


with / required to be an integer to avoid a solution that diverges in the polar directions. 
Here we consider solutions with no azimuthal dependence (i.e., with m = 0), so Eq. (15.41) 
reduces to 


wr, 0) = Year! + bir '—!) P;(cos@). (15.42) 
1=0 


Often our problem is further restricted to a region either within or external to a boundary 
sphere, and if the problem is such that Y must remain finite, the solution will have one of 
the two following forms: 


w(r,0) =) ar’ Pi(cos8) (r < ro), (15.43) 
1=0 

w(r,0)= oar! Pi(cos) (r= 10). (15.44) 
1=0 


Note that this simplification is not always appropriate; see Example 15.2.2. Sometimes 
the coefficients (a;) are determined from the boundary conditions of a problem rather than 
from the expansion of a known function. See the examples to follow. 
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Example 15.2.1 EARTH’S GRAVITATIONAL FIELD 


An example of a Legendre series is provided by the description of the Earth’s gravitational 
potential U at points exterior to the Earth’s surface. Because gravitation is an inverse- 
square force, its potential in mass-free regions satisfies the Laplace equation, and therefore 
(if we neglect azimuthal effects, i.e., those dependent on longitude) it has the form given 
in Eq. (15.44). 

To specialize to the current example, we define R to be the Earth’s radius at the equator, 
and take as the expansion variable the dimensionless quantity R/r. In terms of the total 
mass of the Earth M and the gravitational constant G, we have 


R = 6378.1 + 0.1km, 


GM 
qr = 62.494 + 0.001 km?/s’, 





and we write 


ie) 1+1 
U(r,6) = — E ~ Sra (*) rio (15.45) 
j=2 


The leading term of this expansion describes the result that would be obtained if the Earth 
were spherically symmetric; the higher terms describe distortions. The P; term is absent 
because the origin from which r is measured is the Earth’s center of mass. 

Artificial satellite motions have shown that 


ay = (1, 082, 635 +11) x 10°”, 





a3 = (—2,531+7) x 107°, 





a4 = (—1, 600+ 12) x 107”. 


This is the famous pear-shaped deformation of the Earth. Other coefficients have been 
computed through az9. 

More recent satellite data permit a determination of the longitudinal dependence of the 
Earth’s gravitational field. Such dependence may be described by a Laplace series (see 
Section 15.5). a 


Example 75.2.2 — SPHERE IN A UNIFORM FIELD 


Another illustration of the use of a Legendre series is provided by the problem of a neutral 
conducting sphere (radius ro) placed in a (previously) uniform electric field of magnitude 
Eo (Fig. 15.2). The problem is to find the new, perturbed electrostatic potential y that 
satisfies Laplace’s equation, 


V-y =0. 


We select spherical polar coordinates with origin at the center of the conducting sphere and 
the polar (z) axis oriented parallel to the original uniform field, a choice that will simplify 
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FIGURE 15.2 Conducting sphere in a uniform field. 


the application of the boundary condition at the surface of the conductor. Separating vari- 
ables, we note that because we require a solution to Laplace’s equation, the potential for 
r > ro will be of the form of Eq. (15.42). Our solution will be independent of g because of 
the axial symmetry of the problem. 

Because the insertion of the conducting sphere will have an effect that is local, the 
asymptotic behavior of w must be of the form 





Wir > co) = —E9z = —Eorcosé = —Eor P(cos@), (15.46) 
equivalent to 
a,=0, n>1, ay=—Eo. (15.47) 


Note that if a, 4 0 for any n > 1, that term would dominate at large r and the boundary 
condition, Eq. (15.46), could not be satisfied. In addition, the neutrality of the conducting 
sphere requires that w not contain a contribution proportional to 1/r, so we also must have 
bo = 0. 

As a second boundary condition, the conducting sphere must be an equipotential, and 
without loss of generality we can set its potential to zero. Then, on the sphere r = ro we 
have 


Py(cos@) 


pai 0. (15.48) 
ro 


by = 
V(r, 8) = ao + | > — Eoro } Pi(cos) + Y bn 


0 n=2 


In order that Eq. (15.48) may hold for all values of 6, we set 


aj=0, bi =Epre ba=0, n>=2. (15.49) 
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The electrostatic potential (outside the sphere) is then completely determined: 








Eo i 
wir, 0) = —Egr Pi (cos@) + 5 P(cos@) 
r 
3 3 
"0 "0 
= —EprP,(cos@) { 1 3 | = —£0z 1 ae (15.50) 
r r- 


In Section 9.5 we showed that Laplace’s equation with Dirichlet boundary conditions on a 
closed boundary (parts of which may be at infinity) had a unique solution. Since we have 
now found a solution to our current problem, it must (apart from an additive constant) be 
the only solution. 

It may further be shown that there is an induced surface charge density 


Coa = 3e9 Ep cos (15.51) 
r 


r=ro 
on the surface of the sphere and an induced electric dipole moment of magnitude 
P =4nrgeoEo. (15.52) 


See Exercise 15.2.11. a 


Example 15.2.3 > ELECTROSTATIC POTENTIAL FOR A RING OF CHARGE 


As a further example, consider the electrostatic potential produced by a thin conducting 
ring of radius a placed symmetrically in the equatorial plane of a spherical polar coordinate 
system and carrying a total electric charge q (Fig. 15.3). Again we rely on the fact that 
the potential y satisfies Laplace’s equation. Separating the variables and recognizing that 
a solution for the region r > a must go to zero as r — ov, we use the form given by 








FiGuRE 15.3 Charged, conducting ring. 
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Eq. (15.44), obtaining 
wr, 0) = Sencar <a Pn(cos8), r >a. (15.53) 
There is no g (azimuthal) eee because of the cylindrical symmetry of the system. 
Note also that by including an explicit factor a” we cause all the coefficients c,, to have the 
same dimensionality; this choice simply modifies the definition of cy and was, of course, 
not required. 

Our problem is to determine the coefficients c, in Eq. (15.53). This may be done by 
evaluating w(r, 0) at 90 = 0, r = z, and comparing with an independent calculation of the 
potential from Coulomb’s law. In effect, we are using a boundary condition along the 
z-axis. From Coulomb’s law (using the fact that all the charge is equidistant from any 
point on the z axis), 

i q w(-1/2\ (7 
0 = 
en or (2 +a2)1/2~ Anegz | s iy 
oo 2s 
qd (2s—1)!! fa 
= 1)° : ; 15.54 
Amepz 2 Yas Nz a ey 
where we have evaluated the binomial coefficient using Eq. (1.74). 
Now, evaluating w(z,0) from Eq. (15.53), remembering that P,(1) = 1 for all n, 
we have 
asd a 
W(2,0)= Yen a (15.55) 
n=0 
Since the power series expansion in z is unique, we may equate the coefficients of corre- 
sponding powers of z from Eqs. (15.54) and (15.55), reaching the conclusion that c, = 0 
for n odd, while for n even and equal to 2s, 
q (2s — 1)! 
= 1)° : 15.56 
Tees ree) 
and our electrostatic potential y(r, 0) is given by 
ma a 2s 
V7.0) = 2 5 1) — = “(°) Pos(cos@), r>a. (15.57) 
The magnetic analog of mA en appears in Example 15.4.2. a 
Exercises 


15.2.1 Using a Rodrigues formula, show that the P,,(x) are orthogonal and that 


1 
fv (x)Pdx = : 
. ~ n+" 
—l 
Hint. Integrate by parts. 





15.2.2 


15.2.3 


15.2.4 


15.2.5 


15.2.6 


15.2.7 


15.2.8 
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You have constructed a set of orthogonal functions by the Gram-Schmidt process 
(Section 5.2), taking u,(x) = x", n =0,1,2,..., in increasing order with w(x) = 1 
and an interval —1 < x < 1. Prove that the nth such function constructed in this way is 
proportional to P,(x). 


Hint. Use mathematical induction (Section 1.4). 


Expand the Dirac delta function 5(x) in a series of Legendre polynomials using the 
interval —1 <x <1. 


Verify the Dirac delta function expansions 


[ee 








sa-n = * pew, 

n=0 

= 2n+1 
s+) = Cy" rw, 


n=0 


These expressions appear in a resolution of the Rayleigh plane wave expansion 
(Exercise 15.2.24) into incoming and outgoing spherical waves. 


Note. Assume that the entire Dirac delta function is covered when integrating over 
[—1, 1]. 


Neutrons (mass 1) are being scattered by a nucleus of mass A (A > 1). In the center- 
of-mass system the scattering is isotropic. Then, in the laboratory system the average of 
the cosine of the angle of deflection of the neutron is 





sind dé. 


rs 
1 A 6+1 
(cosy) = / ae 


2) (A2+2Acosé + 1)1/2 
0 


Show, by expansion of the denominator, that (cos w) = 2/(3A). 


A particular function f(x) defined over the interval [—1, 1] is expanded in a Legendre 
series over this same interval. Show that the expansion is unique. 


A function f(x) is expanded in a Legendre series f(x) = Leer an Py (x). Show that 


CO 


1 
2 2a; 
[iro a=) 5a 
=1 





n=0 
This is a statement that the Legendre polynomials form a complete set. 
(a) For 


_ J +t, 0<x<l, 
One —1l<x<0, 
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15.2.9 


15.2.10 


15.2.11 


15.2.12 


15.2.13 


15.2.14 


show that 
1 
2 ss (2n — 1)!! 7? 
[ [ro] ar=220er+9] om | - 
=f n=0 
(b) By testing the series, prove that it is convergent. 


(c) The value of the integral in part (a) is 2. Check the rate at which the series con- 
verges by summing its first 10 terms. 


Prove that 
1 
[ro —x*)P! Pl dx = 
-1 


2n(n2 — 1) 2n(n + 2)(n + 1) 
4n2—1 °™""""" On+1)Qn+3) th 


The coincidence counting rate, W(@), in a gamma-gamma angular correlation experi- 
ment has the form 


[e,e) 


W(0)= > 42n Pon (COs @). 
n=0 


Show that data in the range 1/2 < 9 <z can, in principle, define the function W(@) 
(and permit a determination of the coefficients a, ). This means that although data in 
the range 0 < 6 < 2/2 may be useful as a check, they are not essential. 


A conducting sphere of radius ro is placed in an initially uniform electric field, Eo. 
Show the following: 


(a) The induced surface charge density is o = 3e9 Eq cos0, 
(b) The induced electric dipole moment is P = Anr§ éoEo. 


Note. The induced electric dipole moment can be calculated either from the surface 
charge [part (a)] or by noting that the final electric field E is the result of superimposing 
a dipole field on the original uniform field. 


Obtain as a Legendre expansion the electrostatic potential of the circular ring of 
Example 15.2.3, for points (r, 0) with r <a. 


Calculate the electric field produced by the charged conducting ring of Example 15.2.3 
for 


(a) r>a, (b)r<a. 


As an extension of Example 15.2.3, find the potential y(r, 0) produced by a charged 
conducting disk, Fig. 15.4, for r > a, where a is the radius of the disk. The charge 
density o (on each side of the disk) is 

paxty’. 


o(p)= Arata? = py’ 
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FIGURE 15.4 Charged conducting disk. 


Hint. The definite integral you get can be evaluated as a beta function, Section 13.3. For 
more details see section 5.03 of Smythe in Additional Readings. 





CoO 
q 1 1 ay 
ANS. ie 1 P>)(cos6). 
NS Gs) gh aaa ar{cose) 


15.2.15 The hemisphere defined by r =a, 0 < 6 < 7/2, has an electrostatic potential +Vo. 
The hemisphere r = a, 1/2 < 6 <7 has an electrostatic potential — Vo. Show that the 
potential at interior points is 


4n+3 2n+1 
wd n+2 (<) Pn (0) Pon+1 (cos 8) 








= An + 3)(2n — 1)! 2n+1 
=v >_( eee ) (=) Pon 1 (cos). 
n=0 ep 


Hint. You need Exercise 15.1.12. 


15.2.16 A conducting sphere of radius a is divided into two electrically separate hemispheres by 
a thin insulating barrier at its equator. The top hemisphere is maintained at a potential 
Vo, and the bottom hemisphere at — Vo. 


(a) Show that the electrostatic potential exterior to the two hemispheres is 


(2s — 1)! 


2s+2 
(2s wean ) P2541 (Cos@). 


V¢r,0)= woe 1)°(4s + 3) 


(b) Calculate the electric charge density o on the outside surface. Note that your series 
diverges at cos@ = +1, as you expect from the infinite capacitance of this system 
(zero thickness for the insulating barrier). 








aV 
ANS. (b)o =€9E, = —€E9 — 
ar r=a 
Yi 1)’ (4s goa a P 6 
=£0 ave )"(4s + 3) Past (C088). 


s=0 
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15.2.17 


15.2.18 


15.2.19 


15.2.20 


15.2.21 


15.2.22 


15.2.23 


By writing g,(x) = /(s+4+ 1)/2 P,(x), a Legendre polynomial is renormalized to 
unity. Explain how |gs)(@s| acts as a projection operator. In particular, show that if 


If) =>0,, 4) 1@n), then 
lps) (Pslf) =a, l@s). 


Expand x® as a Legendre series. Determine the Legendre coefficients from Eq. (15.40), 


1 
2: 1 
an = _ [Ps (x) dx. 


—-1 





Check your values against AMS-55, Table 22.9. (For the complete reference, see Addi- 
tional Readings.) This illustrates the expansion of a simple function f(x). 


Hint. Gaussian quadrature can be used to evaluate the integral. 


Calculate and tabulate the electrostatic potential created by a ring of charge, 
Example 15.2.3, for r/a = 1.5(0.5)5.0 and 6 = 0°(15°)90°. Carry terms through 
P22(cos @). 


Note. The convergence of your series will be slow for r/a = 1.5. Truncating the series 
at P2 limits you to about four-significant-figure accuracy. 


Check value. For r/a = 2.5 and 6 = 60°, w = 0.40272(g /47 €or). 


Calculate and tabulate the electrostatic potential created by a charged disk 
(Exercise 15.2.14), for r/a = 1.5(0.5)5.0 and 6 = 0°(15°)90°. Carry terms through 
P22(cos@). 


Check value. For r/a = 2.0 and 6 = 15°, wy = 0.46638(q/4r gor). 


Calculate the first five (nonvanishing) coefficients in the Legendre series expansion 
of f(x) = 1 -— |x|, evaluating the coefficients in the series by numerical integration. 
Actually these coefficients can be obtained in closed form. Compare your coefficients 
with those listed in Exercise 18.4.26. 


ANS. ao = 0.5000, az = —0.6250, aq = 0.1875, ag = —0.1016, ag = 0.0664. 


Calculate and tabulate the exterior electrostatic potential created by the two charged 
hemispheres of Exercise 15.2.16, for r/a = 1.5(0.5)5.0 and 6 = 0°(15°)90°. Carry 
terms through P23(cos@). 


Check value. For r/a = 2.0 and 6 = 45°, V = 0.27066Vo. 


(a) Given f(x) =2.0, |x| <0.5 and f(x) =0, 0.5 < |x| < 1.0, expand f(x) ina 
Legendre series and calculate the coefficients a, through ago (analytically). 
(b) Evaluate 578° 4 dy Py (x) for x = 0.400(0.005)0.600. Plot your results. 


Note. This illustrates the Gibbs phenomenon of Section 19.3 and the danger of trying to 
calculate with a series expansion in the vicinity of a discontinuity. 





15.2.24 


15.2.25 


15.2.26 


15.2.27 
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A plane wave may be expanded in a series of spherical waves by the Rayleigh equation, 


lo) 
ef eosy — Say jn(kr) Pn (COSY). 
n=0 


Show that a, =i"(2n + 1). 
Hint. 


1. Use the orthogonality of the P,, to solve for ay jn (kr). 

2. Differentiate n times with respect to (kr) and set r = 0 to eliminate the 
r-dependence. 

3. Evaluate the remaining integral by Exercise 15.1.15. 


Note. This problem may also be treated by noting that both sides of the equation satisfy 
the Helmholtz equation. The equality can be established by showing that the solutions 
have the same behavior at the origin and also behave alike at large distances. 


Verify the Rayleigh equation of Exercise 15.2.24 by starting with the following steps: 
(a) Differentiate with respect to (kr) to establish 


2 nj, (kr) Pn(cos y) =i > Gn jn(kr) cos y Pn (cos y). 


n n 


(b) Use a recurrence relation to replace cos y P; (cos y) by a linear combination of 
Py—, and Ph+t. 
(c) Use a recurrence relation to replace j; by a linear combination of j,_1 and jn+1. 


From Exercise 15.2.24 show that 
1 
. ) ikr 
in(kr) = = | et’! Pa(w)du. 
2i n 
-1 
This means that (apart from a constant factor) the spherical Bessel function j, (Ar) is an 
integral transform of the Legendre polynomial P, (i). 


Rewriting the formula of Exercise 15.2.26 as 
1 
in(Z) = 4 iy" f eR, (c0s6) sind dé, n=0,1,2,..., 
0 
verify it by transforming the right-hand side into 
X 
/ cos(zcos@) sin?”*! 9 dé 
0 


n 


& 
Qn+Iy! 


and using Exercise 14.7.9. 
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PHYSICAL INTERPRETATION OF GENERATING 
FUNCTION 


The generating function for the Legendre polynomials has an interesting and important 
interpretation. If we introduce spherical polar coordinates (r,@,g) and place a charge g 
at the point a on the positive z axis (see Fig. 15.5), the potential at a point (r, 0) (it is 
independent of g) can be calculated, using the law of cosines, as 


ff 
Ait €9 r| Amt €9 





wr, 0) = (r? +a? — 2ar cos) !/?. (15.58) 


The expression in Eq. (15.58) is essentially that appearing in the generating function; to 
identify the correspondence we rewrite that equation as 











gy =1/2 
qd a a q a 
éd)= 1—2-—cosé+ > — 6,- 15.59 
wr, @) ae ( ~ e088 + =) ce g (cos ) ( ) 
aay vp (cos) (“)" (15.60) 
Anegr <= re? . 


where we reached Eq. (15.60) by inserting the generating-function expansion. 

The series in Eq. (15.60) only converges for r > a, with a rate of convergence that 
improves as r/a increases. If, on the other hand, we desire an expression for y(r, 9) when 
r <a, we can perform a different rearrangement of Eq. (15.58), to 


qd r r2 = 
we.e=— 4 (1-22 cos+ 5) , (15.61) 
JT EQa a a 


which we again recognize as the generating-function expansion, but this time with the 
result 


[ee 





w(r, 0) = —! Y- Pa(cos) (=)", (15.62) 


Arena 
n 


=0 


valid when r <a. 











FIGURE 15.5 Electrostatic potential, charge g displaced from origin. 
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Expansion of | /|r; — ro| 


Equations (15.60) and (15.62) describe the interaction of a charge g at position a = aé, 
with a unit charge at position r. Dropping the factors needed for an electrostatics calcula- 
tion, these equations yield formulas for 1/|r — a|. The fact that a is aligned with the z-axis 
is actually of no importance for the computation of 1/|r — a]; the relevant quantities are r, 
a, and the angle 6 between r and a. Thus, we may rewrite either Eq. (15.60) or (15.62) ina 
more neutral notation, to give the value of 1/|r; — r2| in terms of the magnitudes r;, r2 and 
the angle between r; and r2, which we now call x. If we define r. and r< to be respec- 
tively the larger and the smaller of rj and rz, Eqs. (15.60) and (15.62) can be combined 
into the single equation 





1 (coke eae 
Ir) —m| or » z Pn (cos x), (15.63) 
— > 


> 
n=0 


which will converge everywhere except when r} =1r2. 


Electric Multipoles 


Returning to Eq. (15.60) and restricting consideration to r > a, we may note that its initial 
term (with n = 0) gives the potential we would get if the charge g were at the origin, and 
that further terms must describe corrections arising from the actual position of the charge. 
One way to obtain further understanding of the second and later terms in the expansion is 
to consider what would happen if we added a second charge, —q, at z = —a, as shown in 
Fig. 15.6. The potential due to the second charge will be given by an expression similar to 
that in Eq. (15.58), except that the signs of g and cos @ must be reversed (the angle opposite 
r2 in the figure is 7 — 0). We now have 


qd 1 1 
rere ame 
TTEQ \ V1 12 





Ff Meecs 
~ Ameo (4 te 











FiGURE 15.6 Electric dipole. 
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r A ae —1/2 a - —1/2 
= (1-2%coso+ 7) (1+2 cos +S) 
Areor r r r r 


_ a bs P, (cos6) ei —¥ Palco (-2)' (15.64) 


If we combine the two summations in Eq. (15.64), alternate terms cancel, and we get 








2q a a 
v= — P,(cos@) + — P3(cos@) +--+}. (15.65) 
4reor Lr r3 

This configuration of charges is called an electric dipole, and we note that its leading 
dependence on r goes as r~*. The strength of the dipole (called the dipole moment) can 
be identified as 2qa, equal to the magnitude of each charge multiplied by their separation 
(2a). If we let a + 0 while keeping the product 2ga constant at a value j, all but the first 
term becomes negligible, and we have 


Py (cos@) 
ad eae 
IT EQ r 





(15.66) 


the potential of a point dipole of dipole moment jz, located at the origin of the coordinate 
system (at r = 0). Note that because we have limited the discussion to situations of cylin- 
drical symmetry, our dipole is oriented in the polar direction; more general orientations can 
be considered after we have developed formulas for solutions of the associated Legendre 
equation (cases where the parameter m in Eq. (15.4) is nonzero). 

We can extend the above analysis by combining a pair of dipoles of opposite orienta- 
tion, for example, in the configuration shown in Fig. 15.7, thereby causing cancellation of 
their leading terms, leaving a potential whose leading contribution will be proportional to 
r~> Px(cos@). A charge configuration of this sort is called an electric quadrupole, and 
the P2 term of the generating function expansion can be identified as the contribution of 
a point quadrupole, also located at r = 0. Further extensions, to 2”-poles, with contri- 
butions proportional to P,(cos@)/r”*!, permit us to identify each term of the generating 
expansion with the potential of a point multipole. We thus have a multipole expansion. 
Again we observe that because we have limited discussion to situations with cylindrical 
symmetry our multipoles are presently required to be linear; that restriction will be elimi- 
nated when this topic is revisited in Chapter 16. 

We look next at more general charge distributions, for simplicity limiting consideration 
to charges q; placed at respective positions a; on the polar axis of our coordinate system. 











FiGURE 15.7 Linear electric quadrupole. 
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Adding together the generating-function expansions of the individual charges, our com- 
bined expansion takes the form 


1 jj az 
i es eee ba + a ~ Pi (cos0) + ys oy P>(cos@) + | 
i i i 


1 My M2 
= [0+  Pi(cose) + = P2(cose) +--+], (15.67) 
A4regr r r2 














where the 4; are called the multipole moments of the charge distribution; jo is the 
2°-pole, or monopole moment, with a value equal to the total net charge of the distribution; 
[11 is the 2!-pole, or dipole moment, equal to >; Gai; U2 is the 2?-pole, or quadrupole 
moment, given as )-; qia?, etc. Our general (linear) multipole expansion will converge for 
values of r that are larger than all the a; values of the individual charges. Put another way, 
the expansion will converge at points further from the coordinate origin than all parts of 
the charge distribution. 

We next ask: What happens if we move the origin of our coordinate system? Or, equiv- 
alently, consider replacing r by |r — rp|. For r > ry, the binomial expansion of 1/|r — rp|" 
will have the generic form 


a2 ag? 


lr—rpl”? rr” prt 





tere, 


with the result that only the leading nonzero term of Eq. (15.67) will be unaffected by 
the change of expansion center. Translated into everyday language, this means that the 
lowest nonzero moment of the expansion will be independent of the choice of origin, but 
all higher moments will change when the expansion center is moved. Specifically, the total 
net charge (monopole moment) will always be independent of the choice of expansion 
center. The dipole moment will be independent of the expansion point only when the net 
charge is zero; the quadrupole moment will have such independence only if both the net 
charge and dipole moments vanish, etc. 
We close this section with three observations. 


e First, while we have illustrated our discussion with discrete arrays of point charges, we 
could have reached the same conclusions using continuous charge distributions, with 
the result that the summations over charges would become integrals over the charge 
density. 


e Second, if we remove our restriction to linear arrays, our expansion would involve 
components of the multipole moments in different directions. In three-dimensional 
space, the dipole moment would have three components: a generalizes to (ay, dy, Gz), 
while the higher-order multipoles will have larger numbers of components (a? > 
dxdx, Axdy,...). The details of that analysis will be taken up when the necessary back- 
ground is in place. 


e Third, the multipole expansion is not restricted to electrical phenomena, but applies 
anywhere we have an inverse-square force. For example, planetary configurations are 
described in terms of mass multipoles. And gravitational radiation depends on the time 
behavior of mass quadrupoles. 
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Exercises 


15.3.1 


15.3.2 


15.3.3 


15.3.4 


15.3.5 


15.3.6 


15.3.7 


Develop the electrostatic potential for the array of charges shown in Fig. 15.7. This is a 
linear electric quadrupole. 


Calculate the electrostatic potential of the array of charges shown in Fig. 15.8. Here is 
an example of two equal but oppositely directed quadrupoles. The quadrupole contri- 
butions cancel. The octopole terms do not cancel. 


Show that the electrostatic potential produced by a charge q at z =a forr <a is 





lo) 
= 4 rn 
ef) = Areoa dX S Pn(cos@). 


Using E = —V¢g, determine the components of the electric field corresponding to the 
(pure) electric dipole potential, 


2aq P\ (cos @) 
g(r) = ~ Aneee 
Here it is assumed that r > a. 
Dg i cc 
4regr3 ’ Aneor?’  ® 


Operating in spherical polar coordinates, show that 
d | Pr(cosé) 7 Pi41(cos@) 
dz itl =-@+1) pl+2 


This is the key step in the mathematical argument that the derivative of one multipole 
leads to the next higher multipole. 








Hint. Compare with Exercise 3.10.28. 


A point electric dipole of strength p“ is placed at z = a; a second point electric dipole 
of equal but opposite strength is at the origin. Keeping the product pa constant, let 
a — 0. Show that this results in a point electric quadrupole. 


Hint. Exercise 15.3.5 (when proved) will be helpful. 


A point electric octupole may be constructed by placing a point electric quadrupole 
(pole strength p) in the z-direction) at z =a and an equal but opposite point elec- 
tric quadrupole at z = 0 and then letting a — 0, subject to pa = constant. Find the 
electrostatic potential corresponding to a point electric octupole. Show from the con- 
struction of the point electric octupole that the corresponding potential may be obtained 
by differentiating the point quadrupole potential. 








FiGURE 15.8 Linear electric octopole. 





15.3.8 


15.4 
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FiGURE 15.9 Image charges for Exercise 15.3.8. 


A point charge q is in the interior of a hollow conducting sphere of radius ro. The 
charge q is displaced a distance a from the center of the sphere. If the conducting 
sphere is grounded, show that the potential in the interior produced by q and the dis- 
tributed induced charge is the same as that produced by gq and its image charge q’. The 
image charge is at a distance a’ = i /a from the center, collinear with g and the origin 
(Fig. 15.9). 


Hint. Calculate the electrostatic potential for a < rp < a’. Show that the potential vani- 
shes for r =ro if we take q’ = —qro/a. 


ASSOCIATED LEGENDRE EQUATION 


We need to extend our analysis to the associated Legendre equation because it is important 
to be able to remove the restriction to azimuthal symmetry that pervaded the discussion 
of the previous sections of this chapter. We therefore return to Eq. (15.4), which, before 
determining what its eigenvalue should be, assumed the form 


2 
(1 — x2) P" (x) — 2x P'(x) + [+ A P(x) =0. (15.68) 


Trial and error (or great insight) suggests that the troublesome factor 1 — x? in the 
denominator of this equation can be eliminated by making a substitution of the form 
P =(1—x*)? P, and further experimentation shows that a suitable choice for the exponent 
p is m/2. By straightforward differentiation, we find 


PaU=—<yr"P, (15.69) 
Poder? ae =x Pe, (15.70) 

P'S 0 S32)" PR" = mg =) 
4 [-ma — x2)/2-1 4 ? — 2m) x21 — | P. (15.71) 


Substitution of Eqs. (15.69)-(15.71) into Eq. (15.68), we obtain an equation that is poten- 
tially easier to solve, namely, 


(1 — x2)P" —2x(m + 1)P! + [2 — min + | P=0. (15.72) 


We continue by seeking to solve Eq. (15.72) by the method of Frobenius, assuming a 
solution in the series form )> j aks, The indicial equation for this ODE has solutions 
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k =0 and k = 1. For k = 0, substitution into the series solution leads to the recurrence 
formula 





tens inked) 


15.73 
Ger DG +2) : 


Aj+2= 4; 
Just as for the original Legendre equation, we need solutions P (cos @) that are nonsingular 
for the range —1 < cos@ < +1, but the recurrence formula leads to a power series that in 
general is divergent at +1.” 

To avoid the divergence, we must cause the numerator of the fraction in Eq. (15.73) to 
become zero for some nonnegative even integer j, thereby causing P to be a polynomial. 
By direct substitution into Eq. (15.73), we can verify that a zero numerator is obtained for 
j =!—m when J is assigned the value /(/ + 1), a condition that can only be met if / is an 
integer at least as large as m and of the same parity. Further analysis for the other indicial 
equation solution, k = 1, extends our present result to values of / that are larger than m and 
of opposite parity. 

Summarizing our results to this point, we have found that the regular solutions to the 
associated Legendre equation depend on integer indices / and m. Letting P/”, called an 
associated Legendre function, denote such a solution (note that the superscript m is not 
an exponent), we define 


PP x) = (1 —x?)"/?P™ (x), (15.74) 


where P;” is a polynomial of degree / — m (consistent with our earlier observation that 
1 must be at least as large as m), and with an explicit form and scale that we will now 
address. 

A convenient explicit formula for P;” can be obtained by repeated differentiation of the 
regular Legendre equation. Admittedly, this strategy would have been difficult to devise 
without prior knowledge of the solution, but there are certain advantages to using the 
experience of those who have gone before. So, without apology, we apply Leibniz’s for- 
mula for the mth derivative of a product (proved in Exercise 1.4.2), 


[480] =) (") a) ea (15.75) 
s=0 


dx™—s dxs 


m 


dx™ 








to the Legendre equation, 


(1—x*)P! —2xP/ +10 + DP, =0, 


reaching 
(1 —x2)u" — 2x(m + Du + [H« +1) —m(m+ »] u=0, (15.76) 
where 
1m 
= = Pils). (15.77) 





2The solution to the associated Legendre equation is (1 — x2y"/2D(x), suggesting the possibility that the (1 — x2y"/2 factor 
might compensate the divergence in P(x), yielding a convergent limit. It can be shown that such a compensation does not occur. 
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Comparing Eq. (15.76) with Eq. (15.72), we see that when A = /(/ + 1) they are identical, 
meaning that the polynomial solutions P of Eq. (15.72) for given / can be identified with 
the corresponding u. Specifically, 
q™ 
dx™ 
where the factor (—1)” is inserted to maintain agreement with AMS-55 (see Additional 
Readings), which has become the most widely accepted notational standard.* 

We can now write a complete, explicit form for the associated Legendre functions: 





pm = (-1)"— P(x), (15.78) 





PM (x) = (-1D)™(1 — x2)? cil P,(x). (15.79) 
dx™ 


Since the P;" with m = 0 are just the original Legendre functions, it is customary to omit 
the upper index when it is zero, so, for example, ‘a =P). 
Note that the condition on / and m can be stated in two ways: 


(1) For each m, there are an infinite number of acceptable solutions to the associated 
Legendre ODE with / values ranging from m to infinity, or 


(2) For each /, there are acceptable solutions with m values ranging from / = 0 to/ =m. 


Because m enters the associated Legendre equation only as m”, we have up to this point 
tacitly considered only values m > 0. However, if we insert the Rodrigues formula for P; 
into Eq. (15.73), we get the formula 
(=]" 

2! I! 
which gives results for —m that do not appear similar to those for +m. However, it can be 
shown that if we apply Eq. (15.75) for m values between zero and —/, we get 
(1 —m)! 
(i+ m)! 
Equation (15.81) shows that P/” and P, are proportional; its proof is the topic of 
Exercise 15.4.3. The main reason for discussing both is that recurrence formulas we will 
develop for P/” with contiguous values of m will give results for m < 0 that can best be 
understood if we remember the relative scaling of P/” and P,”. 





qitm 
C9 te ay (15.80) 


PP) = ium * 


PQs" 





PI (x). (15.81) 


Associated Legendre Polynomials 


For further development of properties of the P;”, it is useful to develop a generating func- 
tion for the polynomials P;”(x), which we can do by differentiating the Legendre generat- 
ing function with respect to x. The result is 





(-1)"(Qm—-1)!! : 
Bm (= Go mt = ees (15.82) 
s=0 


3 However, we note that the popular text, Jackson’s Electrodynamics (see Additional Readings), does not include this phase 
factor. The factor is introduced to cause the definition of spherical harmonics (Section 15.5) to have the usual phase convention. 
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The factors ¢ that result from differentiating the generating function have been used to 
change the powers of ¢ that multiply the P on the right-hand side. 
If we now differentiate Eq. (15.82) with respect to t, we obtain initially 


(1 — 2tx + pin = (2m + 1)(x — t)gm(x, t), 
which we can use together with Eq. (15.82) in a now-familiar way to obtain the recurrence 
formula, 
(s+ 1) Py mat) — (2m +14 25)xP in) + (8 + 2m) PT p17 = 9. (15.83) 
Making the substitution / = s + m, we bring Eq. (15.83) to the more useful form, 
@—m+1)Pi, — Ql + 1)xP" + d+ m)Pi", =0. (15.84) 


For m = 0 this relation agrees with Eq. (15.18). 
From the form of g, (x, t), it is also clear that 


(1 — 2xt + £7) gm41(x, t) = —(2m + 1) gm (x, t). (15.85) 
From Eqs. (15.85) and (15.82) we may extract the recursion formula 
Pmt (X) — WPM (x) + PME (x) = —(2m + 1)P Mn (X), 
which relates the associated Legendre polynomials with upper index m + | to those with 
upper index m. Again we may simplify by making the substitution / = s + m: 
PMA! (x) — PMT!) + PMT! (x) = —(2m + 1)PI" (x). (15.86) 


Associated Legendre Functions 


The recurrence relations for the associated Legendre polynomials or alternatively, differ- 
entiation of formulas for the original Legendre polynomials, enable the construction of 
recurrence formulas for the associated Legendre functions. The number of such formulas 
is extensive because these functions have two indices, and there exists a wide variety of 
formulas with different index combinations. Results of importance include the following: 


m+1 2mx m m—1 
P M+ q—piah (x)+Ud+m)i—m-+1)P, (x) =9, (15.87) 
(21 + 1)x Pi" (x) = + m) P(x) + @— m+ IPM (x), (15.88) 
(21+ 1) — x7)? P(x) = Pt (x) — PMT (x) (15.89) 
= (J—m+ 1) —m+2)P"7'(x) 
— (+m) +m—1)P™)"(), (15.90) 
(1— 2?y?( P(x), = ; (+m) —m+1)P"'@)- ; ieec os (15.91) 


MX 


=(U+m)l-—m+ 1) Pr 1 (x) + G—x 12 


P?"(x). (15.92) 
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Table 15.3 Associated Legendre Functions 





Pi@)=—-(—x?)/? = —sino 
P} (x) =—3x(1 — x?)!/? = —3cos@ sind 





P3} (x) =3(1 — x7) =3sin*9 





P3 (x) = —3 (5x? — 1)(1— x?)!/? = —3 (Scos* 0 — 1) sind 
P32 (x) = 15x(1 — x7) = 15cos@ sin? 6 





P3(x) =—15(1 — x*)3/? = —15sin3 6 





P} (x) = —3 (7x3 — 3x)(1 — x2)/? = —3 (7c0s3 6 — 3c086) sind 


PZ (x) = Bx? — 1) — x7) = 8 (7c0s* 6 — 1) sin? 





P3 (x) = —105x(1 — x)3/? = —105cosé sin? 6 





P} (x) = 105(1 — x7)? = 105 sin* 9 





It is obvious that, using Eq. (15.90), all the P/” with m > 0 can be generated from those 
with m = 0 (the Legendre polynomials), and that these, in turn, can be built recursively 
from Po(x) = 1 and P| (x) = x. In this fashion (or in other ways as suggested below), we 
can build a table of associated Legendre functions, the first members of which are listed in 
Table 15.3. The table shows the P;” (x) both as functions of x and as functions of 6, where 
x =cosé. 

It is often easier to use recurrence formulas other than Eq. (15.90) to obtain the P/”, 
keeping in mind that when a formula contains P”_, for m > 0, that quantity can be set to 
zero. It is also easy to obtain explicit formulas for certain values of / and m which can then 
be alternate starting points for recursion. See the example that follows. 


Example 15.4.1 RecurReNcE STARTING FROM P’”" 


The associated Legendre function P,” (x) is easily evaluated: 





m (=1)" m a m (=1)* m 
P™(x) = or (1 — x?) a (x2 -1)"= a (2m)! (1 — x?)"/? 
=(=1)" 0m — DN 27)", (15.93) 


We can now use Eq. (15.88) with / = m to obtain P” 


ef 1» dropping the term containing 
P”" | because it is zero. We get 


P™ (x) = (2m + 1)x PM (x) = (-1)™ (2m + Dil. = x7)", (15.94) 


Further increases in / can now be obtained by straightforward application of Eq. (15.88). 
Illustrating for a series of P;” with m = 2: P}(x) = (—1)?!)(1 — x”) = 3(1 — x”), 

in agreement with the table value. Ey can be computed from Eq. (15.94) as Ee (x) = 

(—1)?(5!!)x(1 — x”), which simplifies to the tabulated result. Finally, Pe is obtained from 
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the following case of Eq. (15.88): 
Tx P2 (x) = 5P2(x) + 2P27(@), 


the solution of which for re (x) is again in agreement with the tabulated value. | 


Parity and Special Values 


We have already established that P; has even parity if / is even and odd parity if / is odd. 
Since we can form P/” by differentiating P; m times, with each differentiation changing 
the parity, and thereafter multiplying by (1 — x7)’"/?, which has even parity, P;" must have 
a parity that depends on / + m, namely, 











P}"(—x) = (-1)'*™ P™ (a). (15.95) 
We occasionally encounter a need for the value of P/"(x) at x =+1 or at x =0. At 
x = +1 the result is simple: The factor (1 — x”)"/? causes P/" (+1) to vanish unless m = 


0, in which case we recover the values P;)(1) = 1, P;(—1) = (—1)/. At x = 0, the value 
of P/" depends on whether / + m is even or odd. The result, proof of which is left to 
Exercises 15.4.4 and 15.4.5, is 


Cpttm dm = De 
P"(0) = (1 — m)!! 
0, 1+m odd. 


, l+m even, 
” (15.96) 


Orthogonality 


For each m, the P;" of different / can be proved orthogonal by identifying them as 
eigenfunctions of a Sturm-Liouville system. However, it is instructive to demonstrate the 
orthogonality explicitly, and to do so by a method that also yields their normalization. 
We start by writing the orthgonality integral, with the P/” given by the Rodrigues for- 
mula in Eq. (15.80). For compactness and clarity, we introduce the abbreviated notation 
R =x? —1, thereby getting 


1 


1 
(-1)” dPt™ RP dit™ R4 
m m _ m 
Ee (x) P, (x)dx = 3P+4 plqi fr Lee em dx. (15.97) 
a —1 





We consider first the case p < q, for which we plan to prove the integral in Eq. (15.97) 
vanishes. We proceed by carrying out repeated integrations by parts, in which we differ- 


entiate 
dPt™ RP 


p+m-+ 1 times while integrating a like number of times the remainder of the integrand, 


dit” RI 
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For each of these p +m +1 <q+™m partial integrations the integrated (uv) terms will 
vanish because there will be at least one factor R that is not differentiated and will therefore 








vanish at x = +1. After the repeated differentiation, we will have 
qptmtl qptm+ dPt+™ RP 
u= [Rr ()]. (15.100) 
dxptm+1 dxptm+1 dxptm 


in which a quantity whose largest power of x is x7?+?” contains also a (2p + 2m + 1)-fold 
differentiation. There is no way these components can yield a nonzero result. Since both 
the integrated terms and the transformed integral vanish, we get an overall vanishing result, 
confirming the orthogonality. Note that the orthogonality is with unit weight, independent 
of the value of m. 

We now examine Eq. (15.97) for p = q, repeating the process we just carried out, but 
this time performing p + m partial integrations. Again all the integrated terms vanish, 
but now there is a nonvanishing contribution from the repeated differentiation of u, see 
Eq. (15.98). Since the overall power of x is still x??+?” and the total number of differ- 
entiations is also 2p + 2m, the only contributing terms are those in which the factor R” 
is differentiated 2m times and the factor R? is differentiated 2p times. Thus, applying 
Leibniz’s formula, Eq. (15.75), to the p + m-fold differentiation of u, but keeping only the 
contributing term, we have 


qptm qP+m RP ptm qzm R™ d2P RP 
R”™ = 
dxprm ( dxprm ) ( 2m )( dx2™ )( dx2P ) 
(p+m)! (p+m)! 
= ————_ (2m)! (2p)! = ———— (2p)!. (15.101) 
(2m)! (p =m)! Po (pam? 
Inserting this result into the integration by parts, remembering that the transformed 


integration is accompanied by the sign factor (—1)?*”, and recognizing that the repeated 
integration of dv, Eq. (15.99) with g = p, just yields R”, we have, returning to Eq. (15.97), 











1 1 
__1)2m+p 
[ [pref ax=s a Oa pt f RPas. (15.102) 
-1 


22P p! p! (p—m)! 


To complete the evaluation, we identify the integral of R? as a beta function, with an 


evaluation given in Exercise 13.3.3 as 


1 
2(2p)!! 22P+1 yn! py! 
[rrax=c 1p (2p) = (-1)P Pp!p 
=1 





= (15.103) 
(2p +1)! 2p+! 


Inserting this result, and combining with the previously established orthogonality relation, 
we have 


1 
i re (x) PZ (x)dx = 


-1 


(p+m)! 
2p+1 (p—m)! 





Baas (15.104) 
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Making the substitution x = cos @, we obtain this formula in spherical polar coordinates: 
vs 
: P7, (cos @) P7' (cos @) sin dé = 
0 


2 (p+m)! 
2p+1 (p—m)! ?* 





(15.105) 


Another way to look at the orthogonality of the associated Legendre functions is to 
rewrite Eq. (15.104) in terms of the associated Legendre polynomials P;”. Invoking 
Eq. (15.74), Eq. (15.104) becomes 

1 
2 = 
[rppra —x°y"dx = 
-1 


! 
ee ae (15.106) 
2p +1 (p—m)! 
showing that these polynomials are, for each m, orthogonal with the weight factor (1 — 
x’). From that viewpoint, we can observe that each value of m corresponds to a set of 
polynomials that are orthogonal with a different weight. However, since our main interest 
is in the functions that are in general not polynomials but are solutions of the associated 
Legendre equation, it is usually more relevant to us to note that these functions, which 
include the factor (1 — x*)’"/, are orthogonal with unit weight. 

It is possible, but not particularly useful, to note that we can also have orthogonality of 
the P;" with respect to the upper index when the lower index is held constant: 


1 


/ Pi" (x) PP (x) — x*)"!dx = 


=1 


mn Omn- (15.107) 
m(l — my)! 


This equation is not very useful because in spherical polar coordinates the boundary con- 
dition on the azimuthal coordinate gy causes there already to be orthogonality with respect 
to m, and we are not usually concerned with orthogonality of the P;" with respect to m. 


Example 15.4.2 — CurRRENTLOOP—MAGNETIC DIPOLE 


An important problem in which we encounter associated Legendre functions is in the mag- 
netic field of a circular current loop, a situation that may at first seem surprising since this 
problem has azimuthal symmetry. 

Our starting point is the formula relating a current element Jds to the vector potential 
A that it produces (this is discussed in the chapter on Green’s functions, and also in texts 
such as Jackson’s Classical Electrodynamics; see Additional Readings). This formula is 

Lo Ids 


dA(r) = — ‘ 
©) 4n |r—YF,| 





(15.108) 


where r is the point at which A is to be evaluated and r, is the position of element ds of the 
current loop. We place our current loop, of radius a, in the equatorial plane of a spherical 
polar coordinate system, as shown in Fig. 15.10. Our task is to determine A as a function 
of position, and therefrom to obtain the components of the magnetic induction field B. 
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FiGURE 15.10 Circular current loop. 


It is in principle possible to figure out the geometry and integrate Eq. (15.108) for the 
present problem, but a more practical approach will be to determine from general consid- 
erations the functional form of an expansion describing the solution, and then to determine 
the coefficients in the expansion by requiring correct results for points of high symmetry, 
where the calculation is not too difficult. This is an approach similar to that employed in 
Example 15.2.3, where we first identified the functional form of an expansion giving the 
potential generated by a circular ring of charge, after which we found the coefficients in 
the expansion from the easily computed potential on the axis of the ring. 

From the form of Eq. (15.108) and the symmetry of the problem, we see immediately 
that for all r, A must lie in a plane of constant z, and in fact it must be in the €, direction, 
with Ag independent of 9g, i.e., 


A= Ag(r, 6) g. (15.109) 
If A had a component other than Ag, it would have a nonzero divergence, as then A would 
have a nonzero inward or outward flux, resulting in a singularity on the axis of the loop. 
Since everywhere except on the current loop itself there is no current, Maxwell’s equa- 
tion for the curl of B reduces to 
VxB=Vx(V x A)=0, 
and, since A has only a g component, it further reduces to 
vx Lv x Ag(r, 6) &| = (15.110) 
The left-hand side of Eq. (15.110) was the subject of Example 3.10.4, and its evaluation 


was presented as Eq. (3.165). Setting that result to zero gives the equation that must be 
satisfied by Ag(r, @): 


a7Ag 2 DAy 1 a dAgy 1 
ind Apa 15.111 
52) ie Pane 00 (si r2sinz9 ° ian 
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Equation (15.111) may now be solved by the method of separation of variables; setting 
Ag(r, 0) = R(r)O(@), we have 








d’?R dR 

2; 

ot SIGH DR =O, 15.112 

eae en ( ) 

12 nT a +11+1)0 as 0 (15.113) 
sin = =U. is 

sind dé do sin’ 6 


Because the second of these equations can be recognized as the associated Legendre equa- 
tion, in the form given as Eq. (15.2), we have set the separation constant to the value it must 
have, namely /(/ + 1), with / integral. The first equation is also familiar, with solutions for 
a given / being r! and r~!—!. The second equation has solutions Pe (cos @), i.e., its specific 
form dictates that the associated Legendre functions which solve it must have upper index 
m = 1. Since our main interest is in the pattern of B at r values larger than a, the radius of 
the current loop, we retain only the radial solution r~!~!, and write 


Ag(r,0)= Sc (“y" P}(cos6). (15.114) 


l=1 


When we obtain a more detailed solution, we will find that it converges only for r > a, 
so Eq. (15.114) and the value of B derived therefrom will only be valid outside a sphere 
containing the current loop. If we were also interested in solving this problem for r <a, 
we would need to construct a series solution using only the powers r’. 

From Eq. (15.114) we can compute the components of B. Clearly, By = 0. And, using 


Eq. (3.159), we have 











B,(r,0)=V x Ag€y| = sii Ag+ : cis (15.115) 
r r r 00 

Bo(r,0)=V x Agéy| Se (15.116) 
0 ror 


To evaluate the 6 derivative in Eq. (15.115), we need 


dP; (cos@) ___ dP; (cos@) 
= — sind = 
dé dcos@ 


a special case of Eq. (15.92) with m = 1 and x = cos{@. It is now straightforward to insert 
the expansion for Ay into Eqs. (15.115) and (15.116); because of Eq. (15.117) the coté 
term of Eq. (15.115) cancels, and we reach 


1+ 1)P)(cos@) — cot@ P} (cos@), (15.117) 





ie a\!+1 
Bea Me) P,(cos6), (15.118) 
= a\itl 
Bo(r,8) = = Be (=) P}(cos6). (15.119) 


To complete our analysis, we must determine the values of the c;, which we do by using 
the Biot-Savart law to calculate B, at points along the polar axis, where B, is synonymous 
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with B,. Since 6 = 0 on the positive polar axis and P;(cos@) = 1, Eq. (15.118) reduces to 


1= ay a a\* 
Bc.) => 10+ Da (2) =-S 6+ Do+ De (SZ). (15.120) 


l=1 s=0 


The symmetry of the problem permits one more simplification; the value of B, must be the 
same at —z as at z, from which we conclude that the coefficients c2, c4,... must all vanish, 
and we can rewrite Eq. (15.120) as 


a 0° a 2s 
B, (2,0) = —y Ys + Qs + Versa (<) (15.121) 
s=0 


The Biot-Savart law (in SI units) gives the contribution from the current element J ds to 
B at a point whose displacement from the current element is rs as 


lo , ds x fs 


~ Ar y2 


dB 





(15.122) 


We now compute B by integration of ds around the current loop. The geometry is shown 
in Fig. 15.11. Note that dB,, which will be the same for all current elements J ds, has the 
value 


I 
dB KO 





sin x ds, 


°° 4rr2 


s 








FIGURE 15.11 Biot-Savart law applied to a circular loop. 
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where x is the labeled angle in Fig. 15.11 andr, has the value indicated in the figure. The 
integration over s simply yields a factor 2zra, and we see that sin y = a/(a” + z7)'/*, so 


2 2 2\ —3/2 
_ Mola” 4 2-3/2 _ Lola a 
ea a (a 


pola? = , 25+)! (a\*s 
aos di 1) Oni (5) (15.123) 





The binomial expansion in the second line of Eq. (15.123) is convergent for z > a. 
We are now ready to reconcile Eqs. (15.121) and (15.123), finding that 
Hol 


2541)! 
—2(s + DAs + Yers+1 = ——( ys a 





’ 


which reduces to 





Hol ¢_ysei 25 =D! 


= 15.124 
C2541 2 ( (2s a 2 ( ) 
We write final formulas for A and B in a form that recognizes that cz; = 0, applicable 
forr >a: 
a a 2s 
Ag(,0) =) casi (= ) PL. ,(cos8), (15.125) 
s=0 
a ha a\ 2s 
B,(r, 9) = 3 ye (2s + 1)(25 + 2)c2541 (=) P2541 (cos 0), (15.126) 
s=0 7 
az md a 2s 
Bo(r,8) = ) “2s + erst (<) P1., :(cos@). (15.127) 
s=0 


These formulas can also be written in terms of complete elliptic integrals. See Smythe 
(Additional Readings) and Section 18.8 of this book. 

A comparison of magnetic current loop and finite electric dipole fields may be of interest. 
For the magnetic loop dipole, the preceding analysis gives 








Uola? 3 a2 
= P dee 15.12 
B,(,0) =F 1-5 (C) B+ ik (15.128) 
pola 1, 3/4274 
Bo(r,0) = 25 ae) Pas) (15.129) 


From the finite electric dipole potential, Eq. (15.65), one can find 





a a\2 
E,(7,0)= a [Pi +2(=) rte], (15.130) 





Fo(r,0)= —2" |-Fi (2) Pha]. (15.131) 


2m eor? 
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The leading terms of both fields agree, and this is the basis for identifying both as dipole 
fields. 

As with electric multipoles, it is sometimes convenient to discuss point magnetic mul- 
tipoles. A point dipole can be formed by taking the limit a > 0, J > 00, with Ia? held 
constant. The magnetic moment m is taken to be J7ra7n, where n is a unit vector perpen- 
dicular to the plane of the current loop and in the sense given by the right-hand rule. Mf 








Exercises 

15.4.1. | Apply the Frobenius method to Eq. (15.72) to obtain Eq. (15.73) and verify that the 

numerator of that equation becomes zero if A =/(J+ 1) and j =/—™m. 
15.4.2 Starting from the entries for Pe and Pp in Table 15.3, apply a recurrence formula to 

obtain Pe (which is Pz), P> ! , and Pp, Compare your results with the value of P2 from 

Table 15.1 and with values of Py! and ie obtained by applying Eq. (15.81) to entries 

from Table 15.3. 
15.4.3 Prove that 

_ (l—m)! 
m _ m m 
Pr (x) = (-)) (4am)! Pr (x), 
where P;" (x) is defined by 
(—1)” ‘ ; qitm ‘i ; 
PM (x)= ah (=x) ree iG = 1)". 

Hint. One approach is to apply Leibniz’s formula to (x + 1)/(x — 1)/. 

15.4.4 Show that 
P3,(0) =0, 
1 _ poy QE 1)! 
P5141 (0) = (-1) OD” 

by each of these three methods: 

(a) Use of recurrence relations, 

(b) Expansion of the generating function, 

(c) Rodrigues formula. 
15.4.5 Evaluate P/”(0) for m > 0. 

l —1)!! 
(dt Gm = DE Il+m even, 
ANS. P;"(0)= (1 —m)!! 
0, 1+ m odd. 

15.4.6 Starting from the potential of a finite dipole, Eq. (15.65), verify the formulas for the 


electric field components given as Eqs. (15.130) and (15.131). 
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15.4.7. Show that 
P} (cos0) = (—1)'(21 — 1)! sin’'@, 1=0,1,2,.... 
15.4.8 Derive the associated Legendre recurrence relation, 


2mx m Pp" 1 
Gaxyiatt (x) + [10+ D— mm] (x) =0. 


15.4.9 Develop a recurrence relation that will yield 2. (x) as 


Pi (x) = fi, DP) + fo, DP-1(). 
Follow either of the procedures (a) or (b): 


pees (x) 4 


(a) Derive a recurrence relation of the preceding form. Give f)(x,/) and fo(x, 1) 
explicitly. 
(b) Find the appropriate recurrence relation in print. 


(1) Give the source. 
(2) Verify the recurrence relation. 


Ix B l 
G—x2 172°! GQ —x2172 





ANS. (a) Pi(x) = Pi-\ 


15.4.10 Show that sin@ P, (cos @) = P} (cos @). 





d 
dcos@ 
15.4.11 Show that 


are dP cae ile ais Ad+1) ! 
(a) | ogee te) @+1)@+m) ii), 
do do sin? 0 2+1 (l—m)! 


b I(z rae dP} ind dd =0 
(6) oh vee ge 


These integrals occur in the theory of scattering of electromagnetic waves by spheres. 











15.4.12 As arepeat of Exercise 15.2.9, show, using associated Legendre functions, that 
1 
feo — x”) P! (x) Pi (x) dx = 


—l 


n+1 2 n! 
2n+1 2n—1 (n—2)! 





5m,n—1 


n 2 (n+2)! 
2n+1 2n+3 n! 





Smn+l- 


rs 
15.4.13 Evaluate i sin? 6 Pp} (cos 6) dé. 
0 
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15.4.14 The associated Legendre function P/” (x) satisfies the self-adjoint ODE 








tho 5 oi" (x) 


1 
( dx2 dx 


+ Jia 1) = ] Fre =o. 


1— x2 
From the differential equations for P;" (x) and Pk (x) show that for k 4m, 


1 
i; Pi" (x) PF (x) 


-1 





dx 
= 0. 
1— x? 


15.4.15 Determine the vector potential and the magnetic induction field of a magnetic 


quadrupole by differentiating the magnetic dipole potential. 


P}(cos@ 
ANS. Amo=— Sa?) (dz) ae? €,+ higher-order terms, 
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3Px(cos@),  P(cos@) , 
Bro = Hotta) | a ee a — & 


This corresponds to placing a current loop of radius a at z > dz and an oppositely 
directed current loop at z  —dz. The vector potential and magnetic induction field of 


a point dipole are given by the leading terms in these expansions if we take the limit 
dz — 0,a— 0, and I — 00 subject to Ja* dz = constant. 


15.4.16 A single circular wire loop of radius a carries a constant current /. 
(a) Find the magnetic induction B for r <a, 9=7/2. 


(b) Calculate the integral of the magnetic flux (B- do) over the area of the current 
loop, that is, 


a 20 
1s 
[rar fav. (,0=5). 
0 0 


ANS. ©. 


The Earth is within such a ring current, in which J approximates millions of amperes 
arising from the drift of charged particles in the Van Allen belt. 


15.4.17 The vector potential A of a magnetic dipole, dipole moment m, is given by A(r) = 
(40/477) (m x r/ r°). Show by direct computation that the magnetic induction B= V x 
A is given by 


wo 3% (*-m) —m 


B 
An re 
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15.4.18 (a) Show that in the point dipole limit the magnetic induction field of the current loop 
becomes 


B,(r,0) = 2 ™ Py(cos0), 
Qn r3 
Hom 44 
Bo(r, 0) = Wan pal (cos 6), 


with m = Ima’. 
(b) Compare these results with the magnetic induction of the point magnetic dipole of 
Exercise 15.4.17. Take m= Zm. 


15.4.19 A uniformly charged spherical shell is rotating with constant angular velocity. 
(a) Calculate the magnetic induction B along the axis of rotation outside the sphere. 


(b) Using the vector potential series of Example 15.4.2, find A and then B for all 
points outside the sphere. 


15.4.20 In the liquid-drop model of the nucleus, a spherical nucleus is subjected to small de- 
formations. Consider a sphere of radius rp that is deformed so that its new surface is 
given by 


— ro| 14: a P2(cos 6) | 


Find the area of the deformed sphere through terms of order a5. 


Hint. 
971/2 
dA=|r7+ ae rsin 0d 0d 
= do e 


ANS. A= Anré [1 + s05 +0 (a3)]. 


Note. The area element dA follows from noting that the line element ds for fixed ¢ is 
given by 


9741/2 
ds = (r? d0* + dr?)'? = la (5) dé. 


15.5 SPHERICAL HARMONICS 


Our earlier discussion of separated-variable methods for solving the Laplace, Helmholtz, 
or Schrédinger equations in spherical polar coordinates showed that the possible angular 
solutions ©(@)®(q) are always the same in spherically symmetric problems; in particular 
we found that the solutions for ® depended on the single integer index m, and can be 
written in the form 


1 





On (~) = OO Hiss... Seal, 0.1 Bs (15.132) 


9 
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or, equivalently, 


1 
——— m=O, 
V2 
1 
Pn (Y) = Jt COSIN®, m > 0, (15.133) 
1 


— sin|mlg, m<0. 


> 


The above equations contain the constant factors needed to make ®,, normalized, and 
those of different m? are automatically orthogonal because they are eigenfunctions of 
a Sturm-Liouville problem. It is straightforward to verify that in either Eq. (15.132) or 
Eq. (15.133) our choices of the functions for +m and —m make ®,, and ®_,, orthogonal. 
Formally, our definitions are such that 


20 


* 
[ [ene] em (oreo = Bn (15.134) 
0 
In Section 15.4 we found that the solutions ©(@) could be identified as associated Leg- 
endre functions that can be labeled by the two integer indices / and m, with —/ <m <1. 
From the orthonormality integral for these functions, Eq. (15.105), we can define the nor- 
malized solutions 





21+ 1 (l—m)! 
Om (cos 0) = 2 dam) P/" (cos@), (15.135) 
satisfying the relation 
as 
* 
[ Om (cos) | 1m (cos) sin 0 dO = yy. (15.136) 
0 


We have previously noted that an orthonormality condition of this type only applies if both 
functions © have the same value of the index m. The complex conjugate is not really nec- 
essary in Eq. (15.136) because the © are real, but we write it anyway to maintain consistent 
notation. Note also that when the argument of P/” is x = cos@, then (1 — x’)!/2 = sing, 
so the P;” are polynomials of overall degree / in cos and sin@. 

The product ©;,,®, is called a spherical harmonic, with that name usually implying 
that ®,, is taken with the definition as a complex exponential; see Eq. (15.132). Therefore 
we define 


21+ 1(—m)! 
4x (l+m)! 





¥"6,9)= Pi" (cosdye'"®, (15.137) 


These functions, being normalized solutions of a Sturm-Liouville problem, are orthonor- 
mal over the spherical surface, with 


20 T 

* 
[ae [sinoao [ye @.0)] Yi"? 0,9) =81448mima- (15.138) 
0 0 
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The definition we introduced for the associated Legendre functions leads to specific signs 
for the Y;" that are sometimes identified as the Condon-Shortley phase, after the authors 
of a classic text on atomic spectroscopy. This sign convention has been found to simplify 
various calculations, particularly in the quantum theory of angular momentum. One of 
the effects of this phase factor is to introduce an alternation of sign with m among the 
positive-m spherical harmonics. The word “harmonic” enters the name of Y/” because 
solutions of Laplace’s equation are sometimes called harmonic functions. 

The squares of the real parts of the first few spherical harmonics are sketched in 
Figure 15.12; their functional forms are given in Table 15.4. 


Cartesian Representations 


For some purposes it is useful to express the spherical harmonics using Cartesian coordi- 
nates, which can be done by writing exp(+ig) as cosy +i sing and using the formulas 
for x, y, z in spherical polar coordinates (retaining, however, an overall dependence on r, 
necessary because the angular quantities must be independent of scale). For example, 


cosdé=z/r, sin@exp(+ig) = sin@ cosy +i sind sing = = x1 fe (15.139) 
r r 











these quantities are all homogeneous (of degree zero) in the coordinates. 

Continuing to higher values of /, we obtain fractions in which the numerators are homo- 
geneous products of x, y, z of overall degree /, divided by a common factor r/. Table 15.4 
includes the Cartesian expression for each of its entries. 


Overall Solutions 


As we have already seen in Section 9.4, the separation of a Laplace, Helmholtz, or even a 
Schrédinger equation in spherical polar coordinates can be written in terms of equations of 
the generic form 





" 2 / 
R’LER +[ fr) 10+ D]R=0, (15.140) 
r 
CR Cece een ee Cee 15.141) 
sin ,~) =0. . 
sind dé do) see de a ( 


The function f(r) in Eq. (15.140) is zero for the Laplace equation, k? for the Helmholtz 
equation, and E — V(r) (V = potential energy, E = total energy, an eigenvalue) for the 
Schrédinger equation. We have combined the 9 and g equations into Eq. (15.141) and 
identified one of its solutions as Y/”. What is important to note right now is that the com- 
bined angular equation (and its boundary conditions and therefore its solutions) will be 
the same for all spherically symmetric problems, and that the angular solution affects the 
radial equation only through the separation constant /(/ + 1). Thus, the radial equation will 
have solutions that depend on / but are independent of the index m. 

In Section 9.4 we solved the radial equation for the Laplace and Helmholtz equations, 
with the results given in Table 9.2. For the Laplace equation V7 = 0, the general solution 
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SS 
SASS: 
BS 
Wty 
ESO 
Wea) 
Ss Y, 


cui 
mA 
Jan 





FiGuRE 15.12 Shapes of |ReY/"(6, y)|* forO <1 <3,m=0...1. 


in spherical polar coordinates is a sum, with arbitrary coefficients, of the solutions for the 
various possible values of / and m: 





lo) I 
W,0,9)= >> YO (amr! + bimr!") ¥7"(0, 9): (15.142) 
I 


1=0 m=— 
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Table 15.4 Spherical Harmonics (Condon-Shortley Phase) 





19 0,9) = T= 
Yi@,9) = Vie 3 sind el? = Vz = (x +iy)/r 
Y0(6,9) = ge c0s0 =4/ x z/r 


¥, 0,9) =+y/ & sinde? =,/ 2 (iy) /r 


¥2(0, 9) =4/ pez 38in? 6%? = 3,/ go (x? — y? + Dixy)/r? 





as (0,9) =—,/ ae 3 sin@ cosé el? = ae 3z(x +iy)/r? 
¥9(0, 9) = y/ a ($0870 - 3) = (322 - $7) Jr? 
sa ee ne ig 4 | S_ — iy) /r2 
Y, (0,9) =/ a4, 3sin@ cosde = +)/ aq, 3z(x —iy)/r 
¥> 7 (0,9) = y/ gpg 38in? Oe 7!¥ = 3,/ gS (x? — y? — Dixy) /r? 
¥3 (0.9) =~yV TROT 8 = Vv TROT 15[x?—3xy? + 1x7 y — y?)I/r? 
¥2(6,9) = y/ gagg 15c0s6 sin? 6c? = \/ eh 152(x? — y? + 2ixy)/r3 
¥1 (0,9) =-,/ tk (Beos?e 3) sind e'? =—,/ it (B2 37?) (x tiy)/r? 
9,9) = / ae (F cos? 4 3.cosé) = ne 2 (32 37?) /r3 
Y3 6, ~Q=t+y wz iz (Heos?6 - 3) sin@ e—'? C=) ie (B2 37? )(x iy)/r? 


aug 152(x? — y? —2ixy)/r? 


¥5 30,9) =+y) rgdyz 15sin? Oe!" = / et 15[x37—3xy? — 1x2 y — y3)]/r3 





























2ip 





¥;7(6,9) = Rae 15cos@ sin? 6 e~ 





for the Helmholtz equation (V* + k?)y = 0, the radial equation has the form given in 
Eq. (14.148), so the general solution assumes the form 


oo I 
V(,8,0)= > D> (amiilkr) + binyilkr)) ¥"@.9). (15.143) 
1=0 m=-l 


Laplace Expansion 


Part of the importance of spherical harmonics lies in the completeness property, a conse- 
quence of the Sturm-Liouville form of Laplace’s equation. Here this property means that 
any function f (9, g) (with sufficient continuity properties) evaluated over the surface of a 
sphere can be expanded in a uniformly convergent double series of spherical harmonics.* 





4Fora proof of this fundamental theorem, see E. W. Hobson (Additional Readings), chapter VII. 
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This expansion, known as a Laplace series, takes the form 





lee) I 
fO~= DI DY cimY["6,9), (15.144) 
1=0 m=-I 
with 
20 4 
cim =(¥i"| @,9) ) = f ae sind dO ¥/"(0, 9)" f 0.9). (15.145) 
0 0 


A frequent use of the Laplace expansion is in specializing the general solution of the 
Laplace equation to satisfy boundary conditions on a spherical surface. This situation is 
illustrated in the following example. 


Example 75.5.1 SPHERICAL HARMONIC EXPANSION 


Consider the problem of determining the electrostatic potential within a charge-free spheri- 
cal region of radius rg, with the potential on the spherical bounding surface specified as an 
arbitrary function V (ro, 8, g) of the angular coordinates 9 and gy. The potential V(r, 0, g) is 
the solution of the Laplace equation satisfying the boundary condition at r = ro and regular 
for all r < ro. This means it must be of the form of Eq. (15.142), with the coefficients bj, 
set to zero to ensure a solution that is nonsingular at r = 0. 

We proceed by obtaining the spherical harmonic expansion of V(r9,9,g), namely 
Eq. (15.144), with coefficients 


cim =(¥"@,9)|V 0,8, 9)). 
Then, comparing Eq. (15.142), evaluated for r = ro, 


ioe) I 
V(r0,9, 9) = >) DY) aimry Y/"6, 9), 


1=0 m=-l 
with the expression from Eq. (15.144), 
lee) I 
V(r9,8,9)= > Y> cimYi"(6,9), 
1=0 m=—-l 
we see that ajm = Cim/ Ay so 


I 
V(r,0,9) = y s cm (=) ¥i"(6, 9). 


1=0 m=-l 
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Example 75.5.2 —_LaPLACE SERIES—GRAVITY FIELDS 


This example illustrates the notion that sometimes it is appropriate to replace the spherical 
harmonics by their real counterparts (in terms of sine and cosine functions). The gravity 
fields of the Earth, the Moon, and Mars have been described by a Laplace series of the 
form 


Cale 2S 7e 
U(r, 0, 9g) = (I ign [Cim Yi, 9”) + Sim Y?, ~) : 
R r r 


1=2 m=0 
(15.146) 
Here M is the mass of the body, R is its equatorial radius, and G is the gravitational con- 
stant. The real functions Y°, and Y°, are defined by Morse and Feshbach (see Additional 
Readings) as the unnormalized forms 


Y,,,(0,9) = P;"(cos@)cosmp, Y,,(0,~) = P;"(cos@)sinmg. 


Note that Morse and Feshbach place the m index before /. The normalization integrals for 
Y° and Y° are the topic of Exercise 15.5.6. 

Satellite measurements have led to the numerical values for C29, C22, and $22 shown in 
Table 15.5. 


Table 15.5 Gravity Field Coefficients, Eq. (15.145). 

















Coefficient“ Earth Moon Mars 

Cr 1.083 x 1073 (0.200 + 0.002) x 1073 (1.96 £0.01) x 1073 
C22 0.16 x 1075 (2.4+0.5) x 1075 (—5 +1) x 1075 
So9 —0.09 x 1075 (0.5 + 0.6) x 10-> (+1) x 10-5 





“Coq represents an equatorial bulge, whereas Cy7 and Sy represent an azimuthal 
dependence of the gravitational field. 


Symmetry of Solutions 


The angular solutions of given / but different m are closely related in that they lead to the 
same solution for the radial equation. Except when / = 0, the individual solutions Y;" are 
not spherically symmetric, and we must recognize that a spherically symmetric problem 
can have solutions with less than the full spherical symmetry. A classical example of this 
phenomenon is provided by the Earth-Sun system, which has a spherically symmetric grav- 
itational potential. However, the actual orbit of the Earth is planar. This apparent dilemma 
is resolved by noting that a solution exists for any orientation of the Earth’s orbital plane; 
that actually occurring was determined by “initial conditions.” 

Returning now to the Laplace equation, we see that a radial solution for given J, i.e., r 
or r~!—!, is associated with 2/ + 1 different angular solutions Y/” (—/ < m <1), no one 
of which (for / 4 0) has spherical symmetry. The most general solution for this / must 
be a linear combination of these 2/ + 1 mutually orthogonal functions. Put another way, 
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the solution space of the angular solution of the Laplace equation for given / is a Hilbert 
space containing the 2/ + 1 members Y, a (0,9),...,Y, i (6, v). Now, if we write the Laplace 
equation in a coordinate system (6’, y’) oriented differently than the original coordinates, 
we must still have the same angular solution set, meaning that Y/"(0’, y’) must be a linear 
combination of the original ¥/”. Thus, we may write 


I 
YO. = > Dm Yi"O, 9), (15.147) 


m'=—I 


where the coefficients D depend on the coordinate rotation involved. Note that a coordi- 
nate rotation cannot change the r dependence of our solution to the Laplace equation, so 
Eq. (15.147) does not need to include a sum over all values of /. As a specific example, we 
see (Fig. 15.12) that for / = 1 we have three solutions that appear similar, but with differ- 
ent orientations. Alternatively, from Table 15.4 we see that the angular solutions Y;” have 
forms proportional to z/r, (x +iy)/r, and (x —iy)/r, meaning that they can be combined 
to form arbitrary combinations of x/r, y/r, and z/r. Since a rotation of the coordinate 
axes converts x, y, and z into linear combinations of each other, we can understand why 
the set of three functions Y;" (m =0, 1, —1) is closed under coordinate rotations. 

For / = 2, there are five possible m values, so the angular functions of this / value form 
a closed space containing five independent members. A fuller discussion of these spaces 
spanned by angular functions is part of what will be considered in Chapter 16. 

Applying the preceding analysis to solutions of the Schrédinger equation, the eigenval- 
ues of which are determined by solving its radial ODE for various values of the separation 
constant /(/ + 1), we see that all solutions for the same / but different m will have the same 
eigenvalues E and radial functions, but will differ in the orientation of their angular parts. 
States of the same energy are called degenerate, and the independence of E with respect 
to m will cause a (2/ + 1)-fold degeneracy of the eigenstates of given /. 


Example 15.5.3 > SotuTIONs FOoR/ = 1 AT ARBITRARY ORIENTATION 


Let’s do this problem in Cartesian coordinates. The angular solution bi to Laplace’s equa- 
tion is shown in Table 15.4 to be proportional to z/r, which for our present purposes we 
write (r- @,)/r, where @, is a unit vector in the z direction. We seek a similar solution, 
with é, replaced by an arbitrary unit vector é, = cosa é, + cos éy + cosy é,, where 
cosa, cos B, and cos y are the direction cosines of é,. We get immediately 


r-é x z 
( OOD saecppst) aon eau: 
r r r 





r 


Consulting the Cartesian-coordinate expressions for the spherical harmonics in Table 15.4, 
we see that the above expression can be written 


-a))0©6 [8x fy-'-y! [8x {-Y,;'-Y¥} 4 
o) - = ( 1 5 ‘cova (0 cos B + = ¥P cos y. 


This shows that all three Y;" are needed to reproduce Y at an arbitrary orientation. Similar 
manipulations can be carried out for other / and m values. | 
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Further Properties 
The main properties of the spherical harmonics follow directly from those of the functions 


Om and ®,,. We summarize briefly: 


Special values. At 6 = 0, the polar direction in the spherical coordinates, the value of g 
becomes immaterial, and all ¥/” that have g dependence must vanish. Using also the fact 
that P;(1) = 1, we find in general 


[21+1 
Yi", 9) =] — dno: (15.148) 


A similar argument for 6 = z leads to 


21+1 
Yi" (x, 9) = (-1)'f —— 48mo. (15.149) 
Ar 
Recurrence formulas. Using the recurrence formulas developed for the associated Leg- 
endre functions, we get for the spherical harmonics with arguments (6, g): 





cosa vp = [Cm Dm DTT PS 
m 


(Qf + 1)Qi +3) 4 


(—m) +m) ]'? 
+lacnarsn| Yi-1» (15.150) 


emt pdm 42)" ei 














ian) 





(21 + 1)(21 + 3) pe 


d= YU —1) 1/2 ee 
ae Nora 1 yy (15.151) 


j la 
ca 12) sin @ y;" =o E 











Some integrals. These recurrence relations permit the ready evaluation of some inte- 
grals of practical importance. Our starting point is the orthonormalization condition, 
Eq. (15.138). For example, the matrix elements describing the dominant (electric dipole) 
mode of interaction of an electromagnetic field with a charged system in a spherical har- 
monic state are proportional to 


ry 
{>] cos Y/"dQ. 


Using Eq. (15.150) and invoking the orthonormality of the Y/", we find 


y* —-m+1)@+m+ 1]? 
rf | 0Y"dQ = Bon Bi 
/[ fl cos l (21 + 1) (21 + 3) m'm Pl! ,1+1 





(l—m)(+m) |‘ 
+ Goporen| Sm'm 81-1. (15.152) 


Equation (15.152) provides a basis for the well-known selection rule for dipole radiation. 
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Additional formulas involving products of three spherical harmonics and the detailed 
behavior of these quantities under coordinate rotations are more appropriately discussed in 
connection with a study of angular momentum and are therefore deferred to Chapter 16. 


Exercises 


15.5.1 Show that the parity of ¥/"(6, g) is (— 1)'. Note the disappearance of any m dependence. 


Hint. For the parity operation in spherical polar coordinates, see Exercise 3.10.25. 


- a41\'" 
15.5.2 Prove that Y;"(0, g) = ars dm0- 


15.5.3 In the theory of Coulomb excitation of nuclei we encounter Y;" (7/2, 0). Show that 
1/2 
ym (Z 0) _ (241)? dm) + mt? (im 
tA An (l—m)!! (+m)! , 
=0, 1+m odd. 





1+ even, 





15.5.4 The orthogonal azimuthal functions yield a useful representation of the Dirac delta 
function. Show that 


1 lo) 
5 = = elm (Yi—92) | 
(91 — $2) on ) 


m=—coO 
Note. This formula assumes that g; and @ are restricted to 0 < g < 27. Without this 
restriction there will be additional delta-function contributions at intervals of 27 in 
P1 — 2. 
15.5.5 Derive the spherical harmonic closure relation 


co) «6+ x I 
YY [ve en] v7", 2) = — 6061 - 62) 501 — 92) 
sin 0} 
1=0 m=-1 
= d(cos 0; — cos 62) 5(y1 — ¢2). 
15.5.6 In some circumstances it is desirable to replace the imaginary exponential of our spher- 
ical harmonic by sine or cosine. Morse and Feshbach (see Additional Readings) define 
Yo, = P;"(cos0)cosmg, m>0, 


Y?, = Pi"(cos@)sinmg, m>0, 


and their normalization integrals are 





2n 1 
An (n+m)! 
Ye 99 »)/* sind dd dy= ,n=1,2,... 
[fi ma (Gg) sin °20n+Da—m)!" 
0 0 


=47, n=0. 
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15.5.7 


15.6 


These spherical harmonics are often named according to the patterns of their positive 
and negative regions on the surface of a sphere: zonal harmonics for m = 0, sectoral har- 
monics for m =n, and tesseral harmonics for 0 <m <n. For Yf,,n=4,m=0, 2, 4, 
indicate on a diagram of a hemisphere (one diagram for each spherical harmonic) the 


regions in which the spherical harmonic is positive. 
A function f(r, 6, g) may be expressed as a Laplace series 
f(,8,9) = >) amr! ¥;" (6,9). 
I,m 


Letting (---)sphere denote the average over a sphere centered on the origin, show that 


(70,6.9)) = 0.0.0). 


sphe: 


LEGENDRE FUNCTIONS OF THE SECOND KIND 


The Legendre equation, a linear second-order ODE, has two independent solutions. 
Writing this equation in the form 


,» ox , W+Y 


=0, 15.153 
y~7_ 2 qo? ( ) 





and restricting consideration to integer / > 0, our objective is to find a second solution 
that is linearly independent from the Legendre polynomials P;(x). Using the procedure of 
Section 7.6, and denoting the second solution Q)(x), we have 


exp [oxsa-2)a 


n(x) = Px) f re 
[Px)P 





=F pdx d 15.154 
= 1 | ROP earn 


Since any linear combination of P; and the right-hand side of Eq. (15.154) is equally valid 
as a second solution of the Legendre ODE, we note that Eq. (15.154) defines both the scale 
and the specific functional form of Q). 

Using Eq. (15.154), we can obtain explicit formulas for the Q;. We find (remembering 
that Po = 1 and expanding the denominator in partial fractions): 


Oo(x) | _ | ae ee Bern (atl (15.155) 
x)= x= x=-I1n Fs 7 
a 1— x2 a} |14+x  1—x a Nae 
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Continuing to Qj, the partial fraction expansion is a bit more involved, but leads to a 
simple result. Noting that P; (x) = x, we have 





fod 1 
O1(2) =x f aoa = 5 n(**) 1 (15.156) 


With significantly more work, we can obtain Qo: 





1 3 
*)-F. (15.157) 


1 
Q(x) =5 Pacsyin(; =, 


This process can in principle be repeated for larger /, but it is easier and more instructive to 
verify that the forms of Qo, Q1, and Q2 are consistent with the Legendre-function recur- 
rence relations, and then to obtain Q; of larger / by recurrence. The recurrence formulas, 
originally written for P; in Eq. (15.18), are 


C+ YQi41@) — QI + Ix Qi(x) +1Qi-1(%) = 9, (15.158) 
(21 + 1) Oy (x) = Oj 44%) — Q,_1 (*). (15.159) 


Verification that Q9, Q;, and Q> satisfy these recurrence formulas is straightforward and 
is left as a exercise. Extension to higher / leads to the formula 





1 1 21-5 
**) “~~” Pp a(x)—---. (15.160) 


i-x) ta aap 


Q(x) = : ricayin( 
Many applications using the functions Q)(x) involve values of x outside the range 
—1 <x <1. If Eq. (15.160) is extended, say, beyond +1, then 1 — x will become neg- 
ative and make a contribution +iz to the logarithm, thereby making a contribution tiz P; 
to Q;. Our solution will still remain a solution if this contribution is removed, and it is 
therefore convenient to define the second solution for x outside the range (—1, +1) with 


1 1 
In ee replaced by In = : 
1-x x-—1 


From a complex-variable perspective, the logarithmic term in the solutions Q; is related to 
the singularity in the ODE at z = +1, reflecting the fact that to make the solutions single- 
valued it will be necessary to make a branch cut, traditionally taken on the real axis from 
—1to +1. Then the Q; with the (1 + x)/(1 — x) logarithm are recovered on —1 <x < 1 
if we average the results from the (z + 1)/(z — 1) form on the two sides of the branch cut. 

The behavior of the Q) is illustrated by plots for x < 1 in Fig. 15.13 and for x > 1 in 
Fig. 15.14. Note that there is no singularity at x =O but all the Q; exhibit a logarithmic 
singularity at x = 1. 











5In Section 15.1 we showed that any set of functions that satisfies the recurrence relations reproduced here also satisfies the 
Legendre ODE. 
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FiGuRE 15.13 Legendre functions Q)(x),O0 <x <1. 











FiGURE 15.14 Legendre functions Q)(x), x > 1. 
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Properties 


1. An examination of the formulas for Q;(x) reveals that if / is even, then Q)(x) is an 
odd function of x, while Q;(x) of odd / are even functions of x. More succinctly, 
O—x) = 1)" Gi). 

2. The presence of the logarithmic term causes Q;(1) = 00 for all /. 

3. Because x = 0 is a regular point of the Legendre ODE, Q/(0) must for all / be finite. 
The symmetry of Q; causes Q/(0) to vanish for even /; it is shown in the next subsec- 
tion that for odd /, 


(2s)! 
(25+ 1)!" 
4. From the result of Exercise 15.6.3, it can be shown that Q);(co) = 0. 


Q2541(0) = (- Dt! (15.161) 


Alternate Formulations 





Because the singular points of the Legendre ODE nearest to the origin are at the points +1, 
it should be possible to describe Q)(x) as a power series about the origin, with convergence 
for |x| < 1. Moreover, since the only other singular point of the Legendre equation is a 
regular singular point at infinity, it should also be possible to express one of its solutions 
as a power series in 1 /x, 1.e., a series about the point at infinity, which must converge for 
|x| > 1. 

To obtain a power series about x = 0, we return to the discussion of the Legendre ODE 
presented in Section 8.3, where we saw that an expansion of the form 


Co 
y= ya (15.162) 
j=0 


led to an indicial equation with solutions s = 0 and s = 1, and with the a; satisfying the 
recurrence formula, for eigenvalue /(/ + 1), 
s+f)se+j+)D—-l¢+1) 


vere SFSU ales 15.163 
= eee yeiy - veo 





When / is even, we found that P;(x) was obtained as the solution y(x) from the indicial- 
equation solution s = 0, and we did not make use (for even /) of the solution from s = 1 
because that solution was not a polynomial and did not converge at x = 1. However, we are 
now seeking a second solution and are no longer restricting attention to those that converge 
at x = +1. Thus, a second solution linearly independent of P; must be that produced (again, 
for even /) as the series obtained when s = 1. This second solution will have odd parity, 
and therefore must be proportional to Q)(x). 
Continuing, for even /, with s = 1, Eq. (15.163) becomes 


np nq, LtItDdH4=D 
GL DG  B) 
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corresponding to 


= —3)¢d-1)d+2)d+4 
( tees 3)( sees 








Qi(x) = by E “| (15.164) 
Here J; is the value of the coefficient of the expansion needed to give the formula for Q) 
the proper scaling. For odd /, the corresponding formula, with s = 0, is an even function 
of x, and must therefore be proportional to Q7: 


+) ei) 
2! 4! 








O1(x) = by E (15.165) 


To find the values of the scale factors b;, we turn now to the explicit forms for Qo and 
Q,, Eqs. (15.155) and (15.156). Expanding the logarithm, we find (again keeping only the 
lowest-order terms) 


Qo(x)H=x+---, Oi(x)=—-14+-:. 


From the recurrence formula, Eq. (15.158), keeping only the lowest-order contributions, 
we find 


202 = 3xQ) Qo > Q2= 2x+-:-- 
3Q3 =5xQ2 —-2Q0; — Q3=2/3+4--- 
404 = 7x Q3 — 3Q2 — Q4=8x/34+--- 





These results generalize to 


(2p)! 





(=I) @p—Di leven, 1=2p, 
— (15.166) 
2p)! 
( pr lodd, 1=2p +1. 


One may now combine the values of the coefficients b; with the expansions in Eqs. (15.164) 
and (15.165) to obtain entirely explicit series expansions of Q;(x) about x = 0. This is the 
topic of Exercise 15.6.2. 

As mentioned earlier, the point x = oo is a regular singular point, and expansion about 
this point yields an expansion of Q(x) in inverse powers of x. That expansion is consid- 
ered in Exercise 15.6.3. 


Exercises 


15.6.1 Show that if / is even, Q;(—x) = — Q(x), and that if / is odd, Q;(—x) = Q)(x). 
15.6.2 Show that 





15.6.3 


15.6.4 
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(p+s)!(p—s)! 2st 
(2s + 1)!(2p — 2s)! 





P 
(a) Oop (x) = (—1)? 27? 9 “(-1)° 
s=0 


° (p +s)! (2s — 2p)! 
2p 2s+1 
ee Qs+)is—pl 





» |x| <1, 
s=pt+l 





(P+s)\(p—s)! 9s 
x 


P 
b ay ae Ppt+152p S 
(b) Oopsi(x) = (1)? #12 2 Gap =I Dl 





[o,@) 


fees) (2s)! (s — p—1)! 


(a) Starting with the assumed form 


O1(x) = > byx*!, 


j=0 
show that 


U+s)!U+2s)!(Q+D)! _», 
s! (I)? (21+ 25 + 1! 





O1(x) = box! S> 
s=0 


(b) The standard choice of bjg is 
2 (LN)? 


Aeneas cues 
OLE 


leading to the final result 


Ona) =x 2) __snae, 
s=0 





(2s)!! (21 + 25 + 1)! 


Show that this choice of bjo brings this negative power-series form of Q, (x) into 
agreement with the closed-form solutions. 


(a) Using the recurrence relations, prove (independent of the Wronskian relation) that 


n| Pa (2) Qn—1) — Pr-1%) Qn(2)] = Pie) ol) — Pox) Q1(x). 


(b) By direct substitution show that the right-hand side of this equation equals 1. 


Additional Readings 


Abramowitz, M., and I. A. Stegun, eds., Handbook of Mathematical Functions with Formulas, Graphs, and 


Mathematical Tables (AMS-55). Washington, DC: National Bureau of Standards (1972), reprinted, Dover 
(1974). 


Hobson, E. W., The Theory of Spherical and Ellipsoidal Harmonics. New York: Chelsea (1955). This is a very 


complete reference, which is the classic text on Legendre polynomials and all related functions. 
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Jackson, J. D., Classical Electrodynamics, 3rd ed. New York: Wiley (1999). 

Margenau, H., and G. M. Murphy, The Mathematics of Physics and Chemistry, 2nd ed. Princeton, NJ: Van 
Nostrand (1956). 

Morse, P. M., and H. Feshbach, Methods of Theoretical Physics, 2 vols. New York: McGraw-Hill (1953). This 
work is detailed but at a rather advanced level. 


Smythe, W. R., Static and Dynamic Electricity, 3rd ed. New York: McGraw-Hill (1968), reprinted, Taylor & 
Francis (1989), paperback. Advanced, detailed, and difficult. Includes use of elliptic integrals to obtain closed 
formulas. 

Whittaker, E. T., and G. N. Watson, A Course of Modern Analysis, 4th ed. Cambridge, UK: Cambridge University 
Press (1962), paperback. 


CHAPTER 16 


ANGULAR MOMENTUM 


The traditional quantum mechanical treatment of central force problems starts from 
solutions to the time-independent Schrédinger equation, which, for a single particle of 
mass m moving subject to a potential V(r), is an eigenvalue problem of the general form 


h2 
- 5 V YO +VOWO = EVO. (16.1) 


Here fi is Planck’s constant divided by 2z, in SI units approximately 1.05 x 10734 J-s 
(joule-seconds); the very small value of this constant causes quantum behavior to be per- 
ceptible under most circumstances only at small distances and for particles of small mass; 
the relevant ranges are typically at atomic scales of mass and length. 

The basic interpretation of the Schrédinger equation is that if the energy E of the particle 
is measured, the result will be one of the eigenvalues of Eq. (16.1), and (subsequent to the 
measurement) the location of the particle will be described by a probability distribution 


P(r)d?r = |Win)? 27, 


where y(r) is an eigenfunction corresponding to E. As we have seen in Chapters 9 and 15, 
w will in general have angular as well as radial dependence, and its angular part can be 
written in terms of the spherical harmonics Y;"(0, ¢). 

A more detailed interpretation of Eq. (16.1) is to identify it as an operator equation in 
which the momentum p is identified with the operator —ihV, while functions of position, 
such as the potential energy V(r), are identified as multiplicative operators. Viewed in this 
way, the operator —(h7/2m)V7? is seen to represent p”/2m (i.e., the kinetic energy T), and 
Eq. (16.1) then becomes equivalent to 


Hw=(T+V)w=Ey, (16.2) 
where H, the Hamiltonian, is an operator whose eigenvalues are the possible values of the 
total energy. 

The Hamiltonian H is a special operator in quantum mechanics because its eigenfunc- 


tions yield stationary probability distributions (they do not evolve into different distribu- 
tions over time). However, H is just like any other quantum operator K representing a 
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dynamical quantity (with eigenvalues k that can be the result of measurement of K). If 
w is simultaneously an eigenfunction of H and of K, then we can have definite values of 
both E and k that will not evolve as a function of time, and the measuring of either will not 
disturb the definite value of the other. This state of affairs can only be achieved if H and 
K commute, because (see Section 6.4) [H, K] = 0 is a necessary and sufficient condition 
that H and K have a set of simultaneous eigenfunctions. 

In earlier chapters, we examined commutators such as [x, p,] =i (here and except when 
noted, we use a unit system with fi set to unity to avoid unnecessary notational complex- 
ity). The nonzero commutator of x and p, tells us that we cannot simultaneously obtain 
unambiguous measurement of both these quantities (i.e., we do not have a complete set of 
states that are simultaneously eigenfunctions of x and p,.). This is the mathematical basis 
of the Heisenberg uncertainty principle in quantum mechanics. 

The notion of simultaneous eigenfunctions and therefore commutation plays a key role 
in the study of angular momentum in quantum mechanics. Angular momentum is con- 
served in the classical central force problem, and one of the focal points of the present chap- 
ter is to understand the properties of angular momentum operators in quantum mechanics. 


ANGULAR MOMENTUM OPERATORS 


In classical physics, the kinetic energy of a particle of mass jz can be written in terms 
of its momentum p as Tolass = p- /2u. Note that we are using jy for the particle mass 
to avoid confusion with the usual notation of the azimuthal wave functions g,,. Most of 
the literature uses m for both quantities. Introducing spherical polar coordinates, Telass 
can be divided into radial and angular parts, with the angular kinetic energy of the form 
ace i Qur. Here Lolass is the angular momentum, defined as Lelass = r x p. Following the 
usual Schrédinger representation of quantum mechanics, the classical linear momentum p 
is replaced (in a unit system with fi = 1) by the operator —iV. The quantum-mechanical 
kinetic energy operator is Tg = —V?/2, which in spherical polar coordinates can be 
written 


Lae 28 1 1 a 7 1: 0 
Tom = ind : 16.3 
Os, E ey =| Dur? E- 30 (si =) w aaaip =| ee) 
Like the classical kinetic energy, Tam can also be divided into radial and angular parts, 


with the angular part identified in terms of the angular momentum: 


1 











Tom = Tradial,QM + Tp Lom (16.4) 
ba et ide rr 20 (165) 
radial,QM = on oP) paek : 
1 9a a 1 @ 
L2,,= ind 16.6 
QM ~~ Sind 00 (sin 53) sin? 6 ag? a 


Since our focus here is on the quantum-mechanical operators, we drop the notation “QM” 
from now on. 
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The notation L? in Eq. (16.6) is only really appropriate if it is consistent with the defini- 
tion of the quantum-mechanical angular momentum operator, which must have the form 


L=rxp=-irxV. (16.7) 


One way to confirm Eq. (16.6) is to start from the expression for L in spherical polar coor- 
dinates, which can be deduced by applying the operator r x p to an arbitrary function yy: 
, ow , low , 1 | 


Ly = —ir x Vw =—iré, x le. ar | °F 00 | *rsind dp 





from which we extract the formula 


(, 1 0d  , 0 
L=i| @&——— —@,— |). (16.8) 
sin@ dg 00 
We then rewrite L in Cartesian components L,, Ly, L, (but still expressed in polar coor- 
dinates) and evaluate 


V=L-L=Li +L) +12. (16.9) 


This process is the topic of Exercise 3.10.32, and leads, as expected, to Eq. (16.6). 

In Section 15.5 we identified the solutions of the angular part of the Laplace and 
Schrédinger equations for central force problems as the spherical harmonics, denoted 
Y;" (6, p). Now that we have also written the angular part of these equations in terms of 
L”, we see that the Y;" can be identified as eigenfunctions of L’, i.e., that they are angular 
momentum eigenfunctions, satisfying an eigenvalue equation of the form 


L?¥"(6, 9) =I + DY;"(6, 9). (16.10) 


Summarizing the discussion to this point, and drawing on previously established properties 
of the spherical harmonics: 


The spherical harmonics Y;" are eigenfunctions of L? with eigenvalue I(1 +1). The eigen- 
functions for a given | are (21 + 1)-fold degenerate and can be indexed by their m values, 
which range in unit steps from —I to 1. 


We now strive for a deeper understanding of the role of angular momentum. The solu- 
tions to the time-independent Schrédinger equation are the eigenfunctions of its total 
energy operator, the Hamiltonian H. We have just observed that for central force prob- 
lems the angular solutions are eigenfunctions of the angular momentum operator L. In 
order for these two statements to be mutually consistent, it is necessary that H and L? 
commute. For the systems under consideration here, this is clearly true, since we have 
assumed that H is of the form T + V(r), so 


1 
H = Tradial(?) + GL", 9) + V(r). 
pr 


Since the only angle-dependent quantity in H is the operator L’, and since L” obviously 
commutes with itself and is independent of r, we have 


[H,L7] =0. 
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The fact that H and L* have simultaneous eigenfunctions in central force problems 
means that the stationary states of such systems can be characterized by definite values 
of both the energy and the angular momentum quantum number /. States of different 
! were ultimately identified with series of lines in the emission and absorption spectra 
of the hydrogen atom that had previously been labeled “sharp,” “diffuse,” “principal,” 
and “fundamental.” This identification caused physicists to use the initial letters of these 
names as synonyms for / values; hence it has become essential to know that the code 
letters for / = 0,1, 2, and 3 are respectively s, p,d, and f. For / > 3, the code letters run 
alphabetically: g,h.... 

Turning now to the components of L, we have (cf. Exercise 3.10.31) 


[Lj, Le] =i8jenZn and [L?, L;]=0, (16.11) 


where j,k,n are different members of the set (1,2,3) and &jxn is a Levi-Civita symbol. 
Although the L; do not commute with each other, all commute with L? and hence also 
with H, so H, L”, and any one component of L mutually commute. We conclude that there 
exists a set of simultaneous eigenfunctions of H, L”, and any one component of L. For 
this purpose we usually pick L,, motivated by the fact that, in spherical polar coordinates, 
it is, as found in Exercise 3.10.29, 
0 
L,=-i—. (16.12) 
dy 

For reference, we copy here the far more complicated results for L, and L,,, obtained from 
Exercise 3.10.30: 


0 ) 
Ly=1 Snes +i cotOcos p—, 


. “ (16.13) 
Ly=-i GOS 0 ae +i coté sa ar 
The spherical harmonics are, in fact, eigenfunctions of L,. Since 
Lene -; 5 gine = mei™?, (16.14) 


dp 


we see that Y;" is an eigenfunction of L, with eigenvalue m. This is one of the reasons 
why the complex exponentials, rather than the trigonometric functions, were chosen in the 
definitions of the spherical harmonics. It is obvious that cos m@ is not an eigenfunction of 
Lz: —i(0/d0~) cosmo = im sinmg. Note, however, that exp(timg), cosmg, and sinmg 
are all eigenfunctions of the operator i = —0*/dg* with eigenvalue m7. 





Ladder Operators 


The commutators of the angular momentum components permit the development of some 
useful algebraic relationships. While these relationships can be found from the specific 
forms of the operators (cf. Exercise 3.10.30), more general and valuable results are 
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obtained by derivations based only on the commutators given in Eq. (16.11). We define 
the operators 


Leolg+ily, LoaLy=il;, (16.15) 

and consider the commutators 
[L,, Ly] =[Lz, Ly] +i[Lz, Ly] =iLy +i(-iL,y) = Ly, (16.16) 
[Lz, L_)=[Lz, Lx] - i[L;, Ly] = iLy —i(-iLy)=—L_. (16.17) 
We start by applying Eq. (16.16) to a function ~;", which is assumed to be a normalized 
simultaneous eigenfunction of L”, with eigenvalue A;, and of L,, with eigenvalue m; the 
form of w;" (and even the space within which it resides) need not be specified to carry 
out the present discussion. Moreover, at this point we introduce no information about the 


possible values of 4; and m. However, to visualize what we are doing, the reader can keep 
in mind that one possible interpretation of 7,” is the spherical harmonic Y;". We have 


[Lz, Ly yj" = L,Liyy" = LyLzyj" = Lip". 


Since Lz," = my)", we can rewrite the central and right-hand members of the above 
equation as 


L,(L+ yj") — m(L+ yj") = (L+7"), 
which rearranges to 
L (Ly yj") = (m+ 1)(Lyy7"). (16.18) 


This tells us that if L, y;" is nonzero, it is an eigenfunction of L, with eigenvalue m + 1; 
for that reason L+ can be called a raising operator. By itself, this analysis tells us nothing 
about the value(s) of m, but only that L, increases m in unit steps. A similar development 
shows that L_ is a lowering operator, corresponding to the equation 


Lz(L_-W}") = (m — 1)(L-¥P"). (16.19) 


Raising and lowering operators are collectively referred to as ladder operators. 
Next, we recall that [L*,Z;] = 0 for all components L;. This means also that 
[L?, L,]=0, so 


L?(L4.y") = Ly Ly" = A(Lyy)"), 


showing that (L+7;") is still an eigenfunction of L? with the same eigenvalue, A), as yy". 
Note that we did not need to know the value of A; to draw this conclusion. Summarizing, 
the operators L+ convert y;" into quantities proportional to | ae with the conversion 
failing only if Lay,” =0. 

While Eqs. (16.18) and (16.19) tell us that L4 are ladder operators, they do not tell 
us whether the quantities L4y;" are normalized. To address this problem, we write the 


normalization expression for Ly," in the form 


(Leyp"|Law") = (Wy |L-L4|v7"), 


where we have used the fact that, because L, and Ly are Hermitian, (L4)t — 7 bp 
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To obtain more information about L_L, we rearrange L? as follows: 
L? = 3 (L4L_ + L_Ly) +12 =L_Ly + $[L4, L-]+ L?, (16.20) 


a result that can be easily verified by expanding L,, L_, and the commutator. Then we 
introduce 


[L4, L-J=[Ly +iLy, Ly —iLy] =—i[L,y, Ly] +i[Ly, Ly] =2L;, (16.21) 
and solve Eq. (16.20) for L_L+, obtaining 
L-Ly =U? - 1? - Ly. (16.22) 


Using the fact that yj" is a normalized eigenfunction of both L? and L,, we can now 
perform the evaluation 


(Ly Wp La ei) = (Wp LL lwp") = (WML? — Lz — Lely") 
=); —m*—m. (16.23) 
A parallel analysis leads to the companion result 
(Lay |L—wj") = (Wp Le Ll") = Ay =m +m. (16.24) 


If we use the expressions in Eqs. (16.23) and (16.24) to account for the scale factors gen- 
erated by the ladder operators, we can summarize their action as 


Law = Va — mim + Dy 
Lev = Ju —m(m — Dy 


where, the reader may recall, A; is the eigenvalue of 2 corresponding to quantum number 
I; the current analysis has not yet determined its value. The expressions in Eq. (16.25) have 
also incorporated the assumption that the signs of the y;” are related as shown. That is a 
matter of definition, and when the y;” are taken to be the spherical harmonics Y;", the 
Condon-Shortley phase assignment was deliberately designed to make Eq. (16.25) consis- 
tent with the signs given the Y;" in Table 15.4. 

Next, we return to Eq. (16.23) and note that since it describes a normalization integral, 
it is inherently nonnegative, and can be zero only if L4y;" is identically zero. The right- 
hand side of Eq. (16.23), however, will be become negative if m is permitted to get too 
large, so for any fixed / (and therefore a fixed 4;), there must be some largest m, which we 
call mmax, for which there exists a y;""™. But if we use Eq. (16.23) to evaluate Lyy;"™", 
we will, unless Ay — mmax(™max + 1) = 0, generate a function with m = mmax + 1, thereby 
creating an inconsistency. Giving mmax the name / (permitted because within the current 
derivation we have not yet assigned a meaning to /), what we have found so far is that 
Ay =/1(1 + 1) and that the maximum value of m is m =/. Remember that we still know 
nothing about possible values for /. 

Turning now to Eq. (16.24), and inserting /(/ + 1) for A;, we note that if m is permitted 
to become too negative we will again have an inconsistent situation, and that it is necessary 
that for some min the right-hand side of Eq. (16.24) must vanish. Thus, we require /(/ + 
1) — Mmin(Mmin — 1) = 0, an equation that is satisfied for mmin =/ + 1 (which is clearly 
irrelevant), and for mmin = —/ (the solution we want). 


(16.25) 
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Finally, we observe that, starting from some w;” with m = 1, we have the severe limita- 
tion that application of the lowering operator L_ will decrease the m value in unit steps, 
but must ultimately reach m = —/ to avoid the generation of an inconsistency. This state 
of affairs is possible if / is a nonnegative integer, in which case there are 2/ + 1 possible 
m values, ranging in unit steps from / to —/. However, it is also possible to assign / a 
half-integer value, as m =/ and m = —!/ are then still connected by a series of unit steps. In 
this case, also, there will be 2/ + 1 different m values. This quantity, 2/ + 1, is sometimes 
called the multiplicity of the angular momentum states. 

The fact that it is mathematically possible to have a series (multiplet) of states corre- 
sponding to either integral or half-integral / and satisfying the angular momentum commu- 
tation rules does not prove that such states are realizable in a particular algebraic system 
(such as that describing ordinary three-dimensional [3-D] space), or that such states are 
indeed relevant for physics. However, by solving Laplace’s equation, we have already 
found that the angular momentum states of integral / can be described in ordinary space 
and that they can be identified as states of ordinary (so-called orbital) angular momentum. 
It is not possible to describe states of half-integral / as ordinary functions in 3-D space, so 
orbital angular momentum will only involve integral /. 


Example 16.1.7 SPHERICAL HARMONICS LADDER 


From Exercise 3.10.30, or alternatively by combining the formulas for L, and Ly from 
Eq. (16.13), the orbital angular momentum ladder operator L, is found to be 


o( 9 ) 
Li =e'? +icoto=). 
00 ap 


Starting from YO, y) = ./3/47 cos, we can apply Eq. (16.25): 


[3 dcosé a 
LiY?@,9) = ao Sa =a | sind) = V2Y1(6, 9), (16.26) 


which when solved for y gives (with proper scale and sign) the value tabulated in 
Table 15.4. The reader can verify that the application of L+ to a gives zero. a 





Spinors 


It turns out that half-integral angular momentum states are needed to describe the intrinsic 
angular momentum of the electron and many other particles. Since these particles also have 
magnetic moments, an intuitive interpretation is that their charge distributions are spinning 
about some axis; hence the term spin. It is now understood that the spin phenomena can- 
not be explained consistently by describing these particles as ordinary charge distributions 
undergoing rotational motion, but are better treated by assigning these particles to states in 
an abstract space that, for the electron, has the / value 5 (but in this context, we normally 
use s and write s = 5)s which means that the possible m values (often written m,;) are 
Ms = +5 and ms = —5. It is not productive to try to think of this situation in terms of 
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ordinary functions, but to accept an abstract formulation in which spin states are repre- 
sented by symbols; popular choices are a or | +) for the state m = +5 and 6 or | |) for 
that with m = —5. These spin states can also be represented by two-component column 
vectors, with the angular momentum operators given in terms of the Pauli matrices as 50%. 

The quantities forming a basis for the multiplets for half-integer angular momentum 
are called spinors. In addition to their manipulation using ladder operators, they have 
rotational properties that are discussed in more detail in Chapter 17. 


Example 16.1.2 — SpINORLADDER 


Calling the angular momentum operator S, we write S,, Sy, S, as the 2 x 2 matrices 59%, 
where o; are defined in Eq. (2.28): 


Ce ie Cee ie Cel a aes 16.27 
PON Gye 2s ON ee Qe PF Ohi ee 


By carrying out matrix operations we can verify that these matrices satisfy the angular 
momentum commutation rules. For example, 


rr res ey a ee AW ae 
Ae EX ANG -OPNG Of 449. OF NT 0 


We also find that 
ss= 55 = 57 = (1/41, 
and we therefore have 
sas 4st4st=cfiqa4a] =o1. 
* y <4 4 


Note that 3/4 is S(S + 1) for $= 1/2. 
The interpretation of these matrix relationships is that we have an abstract space spanned 
by the two functions 


1 iz 0 
vig=enin=()). visteeniur=({), 
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and that the operators S* and S, operate on these functions as follows: 


3/1 0 1 3/1 
2 5: ily 1/2 
S‘a=S wiB=3(, (0) =3 (0) =3 aie 
1/1 0 1 1/1 1 1 
a e,72 1 at a ae 
Sea = Sein = 5 (; 3) (;) ~ 2 (;) =9¥ip = 5% 
7 1 O 0 3/0 3) See 3 
1/2 3 1/2 
oe de i(s )=a(:) 3% ae 


1/1 0\/0 1/0 1 1 

—1/2 __ —1/2 
S,p=S. = 25 esas ks Be 
P= Sela 5(¢ eG 5(1] M12 3 


The above formulas show that vis (also denoted a and £) are simultaneous eigen- 





functions of S? and S. z- To illustrate that they are not also eigenfunctions of Sy or Sy, we 


compute 
y2 1/9 1\fl\) 1/9) 1 ip 1 
sax sonii=3(_ ale =a\,)=2%i2 =F. 


To make ladders, we now form 


ee (aa ee ee ee 
= + = ; —~=oOox—- = e 
eal a ar PNT, 


Applying these operators to w@ = wl a 


se MC) 0 3)0)-()- 


These results are in agreement with Eq. (16.25), for which, with the current parameters 
A = 3/4, m = 1/2, its coefficients are 


VYrA-—m(m+1)=0, JA-—m(m-—1)=1. 


Summary, Angular Momentum Formulas 


The analysis of the preceding subsection applies to any system of operators satisfying the 
angular momentum commutation rules. Possible areas of application include orbital angu- 
lar momentum (for which the eigenfunctions are the spherical harmonics), the intrinsic 
(spin) angular momentum we now know is associated with most fundamental particles, 
and even the overall angular momenta that result either from considering both the orbital 
and spin angular momenta of the same particle, or the total angular momentum of a collec- 
tion of particles (as in a many-electron atom or even a nucleus). 
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It is useful to summarize the key results; we do so giving the operators the name J, 


to emphasize the fact that the results are not restricted to orbital angular momentum (for 
which the symbol L is nearly universally used), or to spin angular momentum (traditionally 
denoted S). Thus: 


1. 


We assume that there exists a Hermitian operator J with components J,, J,, J, such 
that J? + J? + J? =J* and that these quantities satisfy the commutation relations 


piled. IPS 8, (16.28) 


where k,/,n are x, y, z in any order and & xj is a Levi-Civita symbol. Other than the 
requirement of Eq. (16.28), J is arbitrary. 

Because the operators J* and J; commute, there can exist functions (in some abstract 
space), generically denoted yt , with ee simultaneously a normalized eigenfunction 
of J, with eigenvalue M and an eigenfunction of J? with eigenvalue J(J + 1): 


Jw! =MyY, PyMasrtDy”, (uitivi\ = 1. (16.29) 


Operators satisfying the above conditions can be called angular momentum opera- 
tors; those which were used as examples of angular momentum in ordinary space 
(orbital angular momentum) are clearly relevant for physics; similar operators in 
more abstract spaces are relevant only to the extent that they can be identified with 
physical phenomena. 


We have already seen that these assumptions are sufficient to enable the introduction of 
ladder operators, and to reach the following conclusions: 


1. 


2. 


Exercises 


16.1.1 


The possible values of J are integral and half-integral; in ordinary 3-D space only 
functions of integral J can be realized. 

For a given J, the possible values of M range in unit steps from M = J to M= —J; 
this produces 2/ + 1 different M values. 

Given any one yy , we can generate others by use of the operators 


J, = J, +i Jy, J_=J, —iJy. 


The result of applying these operators to rt is, see Eq. (16.25), 





Jul = /T- MI +M +e, (16.30) 





Jp =J/J+M\I-M+ Dy. (16.31) 


These formulas give zero results when J, is applied to Wy and when J_ is applied 
toy, _ 





The quantum mechanical angular momentum operators L, +iLy in 3-D physical space 
are given by 





16.1.2 


16.1.3 


16.1.4 


16.1.5 
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: of... a 
Tig +iLy =e'? (aH cots), 


of 2 a 
L, —iLy=—e'? i coté . 
7 00 dp 





Show that 





(a) (Ly +iLy)¥M(@,9) =J/(L— M(L4+ M4 DYM*'@, 9), 





(b) (L, —iLy)¥"@@,9)=/(L+M\(L—-M+DY** 6,9). 


With L+ given by 














: 0 a 
Le=Lyt iLy= tet"? xicoto | : 


00 dg 
show that 
m _ (+m)! l-myl 
@ =v apd—mt> te 
(b) yn _— (I ZZ m)! (ise i 


(21)\(l + m)! 


Using the known forms of L4 and L_ (Exercise 16.1.2), show that 
foitrt-carppan = / GeV)" (La. YP dQ. 


Here dQ is the element of solid angle (sin@d6d@), and the integration is over the entire 
angular space. 


1 
(a) Show that J? = 5 [ ae J] +32. 


(b) Use the result from part (a) and the explicit formulas for L4 and L_ from Exer- 
cise 16.1.2 to verify that all the spherical harmonics with / = 2 are eigenfunctions 
of L? with eigenvalue /(/ + 1) =6. 


Derive the following relations without assuming anything about yu other than that 
they are angular momentum eigenfunctions: 


M _ | (L+M)! L-M,L 


(L—M)! 


Opie me ve" G9). 


(b) Wi"6.9)= 
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16.1.6 


16.1.7 


16.1.8 


16.2 


Derive the operator equations 
d" sin” OY" (6,9) 
(dcos@)" , 
d" sin” oY (0, 9) 
(dcos 6)" 





(L4)"¥ (6, 9) = (-D"ei"?sin" ™ 6 





(L_y"¥# @, 9) =e "sin? 9 


Hint. Try mathematical induction (Section 1.4). 


Show, using (L_)”, that 
* 


¥,“(@,9)= ("| yi", | 


Verify by explicit calculation that 


3 
(a) LiY?6@,9)=-,/ i sinde’? = /2¥}(0,¢), 


3 
(b+) L_Y?(@,¢)=+,/ 7 sinde"'? = V2Y,'(6,9). 


The signs have the indicated values because the spherical harmonics were defined to 
be consistent with the results obtained using the ladder operators L+ and L_ (Condon- 
Shortley phase). 


ANGULAR MOMENTUM COUPLING 


An important application of ladder operators is to systems in which a resultant angular 
momentum is the sum of two individual angular momenta. Because the angular momenta 
have directional properties, we anticipate a result that has some properties in common 
with vector addition, but because these are quantum mechanical quantities involving non- 
commuting operators, we need to study the problem in more detail. 

If jj and jz are two individual angular momentum operators that act on different coor- 
dinate sets (as, e.g., the coordinates of two different particles), then they are unrelated 
and all components of each must commute with every component of the other. This will 
enable us to carry out a detailed analysis of operators of the combined system, for which 
the total angular momentum operator is J = jj + j2, with components Jy = ji, + jax, 
Jy = ty + joys Je = iz + jaz, with the overall operator J? = J? + Fis + ie 

To discuss the problem, we will need the commutators 


Lik,» Al =texinjin, (jek, jal=iekinjon, (dik jar] = 0. (16.32) 


For the first two commutators, k,/,n are x, y, z in any order; the third commutator vanishes 
for all k,/ including k =/. From the commutators in Eq. (16.32), it is easily established 
that the overall angular momentum components obey the commutation rules 


Ur Iyl=ikz, Wy il=itk, (Je, el =idy, (16.33) 
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so these overall components satisfy the generic angular momentum commutation relations, 
meaning also that 


[J?, J;]=0. (16.34) 
In addition, 
(Fj = 7", #1=0. (16.35) 


However, it is not true that the components of j; or j2, namely j1; or j2;, commute with 
J?, even though J? and the sum Sli + j2i do commute. 


Example 16.2.1 = ComMuTATION RULES FOR J COMPONENTS 


To find the commutator [J?, Jiz], write 
P= Gi thy =Ht+h+2i-b 
= jt t+i3 +2 (jixjox + day jay + jizd2z), 
so we have 


[J jie) = (if. dtc) + WB: fied + 2( Lite fox, Jel + Lity days diel + Uitedae. diel) 


= 2( jolie, Jiz] + jaylity, jncl) = 2i (fx j2y — Jy jax), (16.36) 


where we have dropped terms in which the commutators involve different particles and 
those, e.g., ire Jiz], which vanish because the individual-particle operators are angular 
momenta. 

Equation (16.36) clearly shows that [J’, j1,] is nonzero. However, its contributions are 
equal and opposite to those of [J*, j2-], explaining why [J*, J-] does vanish. 

Consider next [J*, rae Again expanding J*, we get 


(J, 7) = (67, 71 + 163, S71 + 2 (Lite dei) + ity doy. St + Lite ize. i). 


Every term of this equation vanishes, so J* and ij commute. a 


We have noted that ii. 1. and J, all commute with each other and with J’, j),, and 
j2z, but that the last three of these operators do not all commute with each other. There 
are therefore different ways of selecting maximal sets of mutually commuting operators 
for which we can construct simultaneous eigenfunctions. One possibility is to select re 
i Jiz, j2z, and Jz, which has the advantage that the simultaneous eigenfunctions are just 
products of the eigenstates for individual j;, but has the disadvantage that we will not 
have states of definite total angular momentum J”. This is a big disadvantage, because in 
reality different angular momenta in the same system actually interact to some extent. If 
we add to the Hamiltonian of our problem a small term (a perturbation) that causes the 
individual angular momenta not quite to be independent, our system will still strictly have 
conservation of J* (i.e., H and J* will still commute), but the perturbation added to the 
Hamiltonian will not commute with j; and jo. 
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Alternatively, and for most purposes better, we could choose the mutually commut- 
ing operator set J’, in is and J,, which would describe states of definite total angular 
momentum, but these states would be mixtures of the individual angular-momentum states 
and would not have definite values of j1, or j2,. It is the purpose of this section to relate 
these two descriptions by finding the equations that connect (i.e., couple) the individual 
angular momenta to form states of definite J*. 

To simplify future discussion, we can refer to the product basis of the preceding para- 
graph as the m,, mz basis, and call the alternative basis of definite J the J, M basis. The 
m,,my basis members also have definite values of M, but not J; most members of the 
J, M basis will not have definite values of either m, or m2. If we stick with problems in 
which j, and j2 are fixed, all members of both bases will have the same definite values of 
these quantum numbers. 

Before getting into the details, let’s make two observations. First, since we have raising 
and lowering operators that we can apply to the J, M basis, the functions in this basis must 
include all the M values for any J that is present at all. Second, since both bases have 
definite values of M, the transition from one basis to the other cannot mix functions of 
different M. 


Vector Model 


We begin with some qualitative observations. Since J; = j1z + j2z (with eigenvalues we 
call M) is part of both our commuting operator sets, we can conclude, from looking at 
the m 1, mz basis, that the maximum eigenvalue Max of Jz will occur when m; = j; and 
m2 = j2, $0 Mmax = Ji + jz. Moving now to the J, M basis, which of course spans the 
same function space, we see that because Mimax is the maximum M value, it must be a 
member of a multiplet with J = Mmax, and this must be the largest possible J. Thus, 
Jmax = Ji + j2- 

To establish the minimum value possible for the quantum number J is a little trickier, 
and we will come back to that shortly. The result, which is simple, is that Jmin = |j1 — jal. 
These maximum and minimum values of J correspond to the notion that the classical vec- 
tor sum jj +j2 has a maximum length equal to the sum of the lengths of these vectors and a 
minimum length equal to the absolute value of their difference; the quantum analog of this 
notion is not quantitatively exact because the magnitude of each j is actually /7(j + 1). 

Further qualitative observations follow if we tabulate the various possible m1, m2 func- 
tions of various M values. The concept can be understood from a simple example. Suppose 
Ji =2, j2 = 1. Then the members of the m,, mz basis can be grouped as shown here. The 
kets in the table are labeled in more detail than usual to avoid potential confusion; those 
labeled m, have j value j;, those labeled mz have j = jo. 















































M = 43 |my =+2)|mz = +41) 
M = 42 |my =+2)|my= 0) (my =+1)|mz = 41) 

M=+41|m 2)|m2 1) [my =+1)|m2= 0) |m, = 0)|m2 =+1) 
M= 0 |m 1)|m2 1) [my = 0)|m2= 0) |my =—1)|m2 = +1) 
M=-I1|m 2)\m2=+1) |my=—1)lm2= 0) |m, = 0)|m2 =—-1) 
M =—2 |m, =—2)|my= 0) |my =—1)|mz =—-1) 

M = -3 |m, =—2)|mz =-1) 
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Because the basis transformations we are discussing only mix basis functions of the 
same M, a transition to the J, M basis will have the same number of functions of each M 
as are in the row of our table for that M. If there is only one function in the row, it must 
(without change) be a member of the J, M basis, and we may use it as a starting point for 
getting all the other members of the multiplet for the same J by application of the ladder 
operators. So in our current example, we can start from |m, = +2)|m2 = +1), and make 
one member of the multiplet for each M value in the table. 

Once this has been done, we will have constructed as many J, M functions as there 
are entries in the first column of the table (but remember that in most cases they will not 
be the specific functions sitting in that column). But that observation does tell us that the 
numbers of functions that are still unused (but not their exact forms) will correspond with 
the numbers of functions in the remainder of the table. In particular, we see that there will 
in our example be one function left over with M = 2. Because it cannot have a J = 3 
component, it must be a |J = 2, M = 2) eigenfunction and therefore must be orthogonal 
to the |J = 3, M = 2) function we have already found. That means that we can obtain it by 
Gram-Schmidt orthogonalization within the function space for M = +2. 

From the | J = 2, M = 2) function, we can apply a ladder operator to find |J = 2, M) 
basis members with other M values, the number of which will correspond to the number 
of entries in the second column of our table. To continue to a third column, we would need 
to find a M = +1 function orthogonal to both the |J = 3, M=+1) and |J =2,M=+1) 
functions. This process can be continued until the m;, mz basis has been exhausted. 

Taking now a further look at our table, we see that the number of columns with entries 
increases as we decrease M from its maximum value until M has reached |j; — j2|; for 
smaller |M| than that, the number of columns in use stays constant, because of limitations 
in the way the individual m values can be chosen to add up to M. That gives us a graphical 
indication that the smallest J value will be |; — j2|. 

A more algebraic way of determining the smallest resultant J is based on a computa- 
tion of the total number of J, M states generated if the possible J values run from an as 
yet undetermined value Jmin to our previously determined maximum value Jmax. Since 
the number of states for each J is 2J + 1, the total number of J, M states we will have 
produced is 


J=Jmax 
oe (2J “E 1) = (Jimax — Jmin + 1) (Jmax + Jmin + 1) 
J=Jmin 


= 2f.+D@p+ 1), (16.37) 
where the second line of this equation reflects the fact that the total number of states is 


readily counted in the m,,mz basis. Inserting the value Jmax = j1 + j2 and solving for 
Jmin, we find 


Jnin = lf = j2i- 
Another way of stating this result is to observe that the possible values of J satisfy a 


triangle rule, meaning that they occur in unit steps from a maximum of j; + j2 to a 
minimum of | j; — j2|. 
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Ladder Operator Construction 


To develop a quantitative description of angular momentum coupling, we consider the 
case of general j; and j2, and start from the lone member of the m,,mz basis with 
M = j, + jz. In line with our earlier discussion, this m,, mz basis member must also be a 
function of the definite J value Jax = j1 + j2. Using a notation in which the lower entry 
in each ket is its J value and the upper entry gives the value of M, we indicate this by 


writing 
nil 
i 


We now generate additional states of the same J but with different M by applying the 
lowering operator J_ to Eq. (16.38); when we apply it to the right-hand side, we do so in 
the form J_ = jj— + jo_. The result, for the left side of Eq. (16.38), is 


i =V 2J, max 


max 


j2 


Jmax 





Jinax = 








a (16.38) 


fe (16.39) 








Jmax ~ ‘) 


Jmax 


The coefficient ./2Jmax is that given by Eq. (16.31) for J = M = Jmax. For the right side 


of Eq. (16.38), we get 
Dewalae 
Ji Ji J2 


ji 
ji 
a) - 7 (16.40) 
Jl Jl 


J2 
where we have again obtained the coefficients from Eq. (16.31), but now evaluating them 
for the first term with (J, M) = (j1, j1) and for the second term with (J, M) = (j2, j2). 


(iss) 








p\} |. 
: = 








i) 
J2 








— 2i1 








: r bE 


j2 








Combining these results, and solving for Jmax — | : 
Jmax 
— jl . 2 R “7 
=. ——— j ~ Pee S| ; ; 16.41 
Jmax Jimax | J j2 Jimax | J1 j2 ( ) 

















With escalating complexity, we could continue this process to smaller values of M. 
As indicated in our earlier, more qualitative discussion, we can reach functions with 
J = Jmax — | by starting from the unused member of the set of two functions 


tile AIR Hi\\je=1 
ft fle” np) ae Fe 


J max ~~ : 














The quantity we seek, 


Jmax = 1 
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will be the function in the above-defined subspace that is orthogonal to 


Jmax ~ ‘) 


Jinax 





as given in Eq. (16.41), and therefore will be 


os R a) 2 Sl :) 
=-,/ ‘ ae a ; 
Jax — | Jmax Ji J2 Ji 

At this point we note that the function produced by Eq. (16.42) could have been written 
with all its signs changed, as the orthogonalization process does not determine the sign of 
the orthogonal function. This only matters if we wish to correlate the signs of our J, M 
constructions with work by others. Irrespective of our choice of signs, we can apply J_ to 
reach the full set of M values, and then continue to states of smaller J until the m1, m2 
space is exhausted. 

The general result of the above-described processes is to obtain each J, M eigenstate 
as a linear combination of m1, mz states of the same M, in a fashion summarized by the 
following equation (written in a less cumbersome notation now that the need for detail has 
disappeared): 




















- '), (16.42) 


J, max 


|J, M) = ye Cit, j2, J m1, m2, M)|j1,m4; j2,m2). (16.43) 


m,,m2 


Here |j1, m1; j2,m2) stands for |j,,m1)|j2,m2) and we have given over to the coeffi- 
cient C(j1, j2, J|m1, m2, M) the responsibility to vanish when m, + m2 4 M. Thus, the 
apparent double summation in Eq. (16.43) is actually a single sum. The coefficients in 
Eq. (16.43) are called Clebsch-Gordan coefficients. To resolve the sign ambiguity result- 
ing from the orthogonalization processes, they are defined to have signs specified by the 
Condon-Shortley phase convention. 

It is important to realize that all the results of this section remain valid irrespective of 
whether j;, j2, or both are integral or half-integral. For example, if jj = 1 and j2 = 5 
(corresponding to the coupling of the orbital and spin angular momenta of an electron), 
the possible J, M states will be a quartet for J = 3/2 (with M values +3/2, +1/2, —1/2, 
—3/2), and a doublet for J = 1/2 (with M values +1/2 and —1/2). 

A second way to look at the Clebsch-Gordan coefficients is to identify them as the scalar 
products 


Cj, j2, J|m1,m2, M) = (J, Mj, m4; j2,mz2). (16.44) 


Because of the method used for the construction of the | J, M), we can make one additional 
observation: The Clebsch-Gordan coefficients will all be real, even if the |j,,m,) and 
|j2, mz) used for their construction are not. 

The Clebsch-Gordan expansion can be interpreted in yet another way. We can view the 
Clebsch-Gordan coefficients as the elements of a transformation matrix converting func- 
tions of the m1, mz basis into those of the J, M basis; since both basis sets are orthonormal, 
the transformation must be unitary (and because it is real, orthogonal). This means that the 
inverse transformation, (J, M) — (mj ,, mz), must have a transformation matrix that is the 
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transpose of that for the forward transformation (m,,m2) > (J, M). That means that we 
also have the equation 


[iiss jo,m2) = D> CCA, jz, Jim, m2, M)|J, M). (16.45) 
JM 


This equation is correct and corresponds to our discussion. Note that instead of reversing 
the index order of the transformation matrix we have interchanged the index sets identify- 
ing the functions. 

In passing, we make one further comment. While the Clebsch-Gordan coefficients can be 
identified as forming a transformation matrix, note that their row/column indexing differs 
from the pattern to which we are accustomed, since, instead of labels running from | to n 
(the dimension of the transformation), we are using in one dimension the compound index 
(m1, mz), and in the other dimension the compound quantity (J, M). This Clebsch-Gordan 
matrix will be somewhat sparse (containing many zero elements). The zeros occur because 
the coefficients vanish unless M =m, + mp. 

There is a significant literature on the practical computation of Clebsch-Gordan coeffi- 
cients,' but to make the present discussion complete we simply give here a closed general 
formula: 


C(i1, j2, J\|m1, m2, M) = F, FoF, (16.46) 


where 








ae Git j2- INS + fi — jy + jo -— ft2J +1) 
(Git j+J+4+1)! 





= vu + MJ — MMi + mi)! — m1) 12 + m2)"(J2 — m2)!, 





(—1} 
F3= 
° 2. Gam aa emf = pm ow 


Ss 


1 
x : 
(J fi-—m2+ si + j2—J —s)is! 





The F3 summation is over all integer values of s for which the factorials all have non- 
negative arguments (which will be integral). The sum is therefore finite in extent and F3 
is a closed form. Equation (16.46) is only to be used for parameter values that satisfy the 
angular momentum and coupling conditions: j1, j2, J must satisfy the triangle condition, 
m; is to be from the sequence /;,/; —1,..., —lj @ = 1,2), M to be from J, J—1,...,-—J, 
and M=m,+™mp. 

Finally, we call attention to the fact that Clebsch-Gordan coefficients have symmetries 
that are not obvious from the foregoing development. To expose the symmetries, it is 
convenient to convert them to the Wigner 3 j-symbols, defined as 

( Ti 29 ) Gp s 


ee Re a oe ie feu 
m, m2 m3 (273 + 1)!/2 (i, J2, J3|m1,m2, —m3) ( ) 


' See Biedenharn and Louck, Brink and Satchler, Edmonds, Rose, and Wigner in Additional Readings. Clebsch-Gordan coeffi- 
cients are also tabulated in many places, and can easily be found online by a Web search. 
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Extensive discussion of 3 j-symbols and related quantities is beyond the scope of this book. 
This important, but advanced, topic is presented in most of the sources listed under Addi- 
tional Readings. 

The 3 j-symbols are invariant under even permutations of the indices (1,2,3), but under 
odd permutations (1, 2,3) — (k,/,n) transform as follows: 


( Ji j2 PB \e (—1iitata ( Jk fl Jn ). (16.48) 


m, m2 m3 Me Mm, My 


They also have the following symmetry under change of sign of their lower indices: 


( Ai j2 )=citata ( Ap OB ). (16.49) 
m, m2 M3 —m, —mM. —M3 

Even though some of the j; may be half-integral, remember that j3 must be equal to j; + j2 
or differ therefrom by an integer. This fact causes the powers of —1 in Eqs. (16.47) through 
(16.49) to be integral, so these factors are not multiple-valued and the sign assignments of 
the 3 j-symbols are unambiguous. These symmetry relations make a table of 3 j-symbols 
more compact than one of Clebsch-Gordan coefficients; such tables can be found in the 
literature,” and a short list is included here, as Table 16.1. 

We close this section with two examples. 


Table 16.1 Wigner 3 j-Symbols 
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2See, for example, M. Rotenberg, R. Bivins, N. Metropolis, and J. K. Wooten, Jr., The 3j- and 6j-Symbols. Cambridge, MA: 
Massachusetts Institute of Technology Press (1959). 
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Example 16.2.2. TwoSpinors 


This example describes a problem that exists entirely in a abstract space, namely the cou- 
pling of two spin-5 particles (e.g., electrons) to form combined states of definite J. Letting 
a stand for a normalized single-particle state with j = 7 m= +3, with 6 a normalized 


state with j = 5 m= -}, we have the following four states in the m1, m2 basis: 
M=1: aa 
M=0: ap Ba 
M=-1: BB 


For all these two-particle states, the first symbol refers to particle 1, the second to particle 
2. From Eq. (16.31) and Example 16.1.2, we have j_a = 6B, j- 68 = 0, and we can use 
the following rearrangement of Eq. (16.31) to deal with the |/, M) states. Again we use a 
notation in which the lower entry in the ket is J; the upper entry is M: 


M-1 1 M 
_ . 16. 
| J JOP MG MFT ") 








The maximum M value in this system is M = +1, so the one state of this M value must 


have J = 1. showing that 





= aq. Starting from it, we lower M: 


1 =ad, 
Ott 3 1 
Y= yet |i)= ae a 


-1 1 1 1 1 
= —= i), = + =. “ 
"') a ; aia a 
These are the well-known members of the S = 1 spin multiplet, which is known as a 


0 
1 must 





triplet. At 1 = 0, where there were two m,, mz states, the state orthogonal to 





be the : 


Even though we do not have an entirely explicit representation of the states w and B, we 
do know that they are normalized eigenstates of a Hermitian operator (J,) with different 
eigenvalues, and therefore they must be orthogonal. Thus, we can apply the Gram-Schmidt 
process to the M = 0 subspace, using the relations 


state. 





(aor) = (BIB) =1,  (a|B) =0. 


We easily find that the normalized function orthogonal to (#6 + Ba)/V/2 is (wB — Ba) /V/2. 
It is the only member of the S = 0 multiplet, and is therefore known as a singlet. Note that 
we didn’t have to know anything specific about spin operators to carry out this analysis. 
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Our tableau of states can now be written in the J, M basis: 


J=1 J=0 
M=1 aa 
M=0 (aB+fa)/V2 (af — Ba)/J2 
M=-1 BB 


From the J, M tableau, we can read out the Clebsch-Gordan coefficients: 


c(4.4,0| 3,-4.0) =-c(4, 4.1] -3,4,.0)= 


Cd Al -f-f-1)=1 
These coefficients can also be obtained from our table of 3 j-symbols. Using Eq. (16.47), 
we find the coefficients for |J = 1, M = 0) to be 
1 
, 


I ol 4 
c(hdil-hhg=va(_} f }) 


2 2 


c(h 4th f.0) =v ( 


NI NI- 
NI NI 


Both these 3 j-symbols correspond to the same entry in Table 16.1, and the symmetry rules 
give each the value +1/./6. Therefore, both these Clebsch-Gordan coefficients evaluate 


to /3/4/6, or, as expected, 1//2. 
For |J =0, M = 0), we have, again calling on Eq. (16.47), 


:) 
| 


Again these 3 j-symbols both correspond to the same tabulated entry (with value 1/./2), 
but this time the symmetry rules cause them to have the respective values +1/./2 and 
—1/,/2, in agreement with our explicit evaluation. a 


C (3.3.0 


NI Nie 


NI Nie 


Example 16.2.3. Couptinc or p AND d ELECTRONS 


As most physics students know, a p state is an angular momentum eigenstate with / = 1 (so 
m can be 1, 0, or —1). The three normalized functions constituting its multiplet are often 
denoted p+, po, and p_. Ad state has / = 2; we denote the five normalized members of its 
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multiplet d,2, di, do, d_, and d_y. The m,, mz basis has 15 members; grouped according 
to their M values, they consist of 





M=+3 p+d42 
M=+4+2 p+d. pod+2 
M=+1  pydo pody = p-dy2 
M= 0 pyd— podo- p-—dy 
M=~—1 _ p3d_2 pod—- p-do 
M=-2  pod_2 p_d_ 
M=-3 p—d_2 


























This is the same coupling of angular momenta j = 1 and j = 2 that was introduced at 
the beginning of the subsection entitled Vector Model, but we are now illustrating how to 
carry out the coupling computations using Clebsch-Gordan coefficients and 3 j-symbols. 
From this diagram, we expect one multiplet with J = 3, which in atomic spectroscopy is 
denoted F (multiparticle orbital angular momentum states are designated using upper-case 
letters); one with J = 2 (called D), and one with J = 1 (called P). Our plan is to construct 
these using the 3 j-symbols given in Table 16.1. 

We start by writing, in the notation |/, M), the members of the F multiplet with M > 1 
in terms of Clebsch-Gordan coefficients (those for M < 1 do not raise important new 
points): 


13,3) =C, 2, 3|1, 2, 3) p4+dy2, 
13,2) =C, 2, 3|1, 1,2)p4d, + CC, 2, 30, 2, 2) podi2, 
(3, 1) = CCA, 2, 3[1, 0, I) py.do + CCA, 2, 310, 1, 1) pods + CC, 2, 3|—-1, 2, 1) p_dyo. 














The D and P multiplet members for M > 1 are 
|2,2) = C1, 2, 2|1, 1, 2) pid, + C1, 2, 2|0, 2, 2) pod+2, 
|2,1) =C(, 2, 2/1, 0, 1) p4do + CCA, 2, 2/0, 1, 1) pod+ + CC, 2, 2|—1, 2, 1) p_dyo, 
[1,1) =CQ, 2, 1/1, 0, 1) p4do + CCA, 2, 1]0, 1, 1) pods + CC, 2, 1]—1, 2, 1) p_dyo. 











We then express the Clebsch-Gordan coefficients in terms of 3 j-symbols. Doing just a rep- 
resentative few, using Eq. (16.47) and then the symmetry rules, Eqs. (16.48) and (16.49), 


i 2 4 
C11,2.31h 2,9 =4V7/ ic 1, 


i 2-3 
12 2 { 2-9 1 
C(1, 2, 2/1, 1,2) =-V/5 ng / 5 =e oe 
i at lo @ 3 


12 1 ii 2 3 
C(1, 2, 1)-1,2, ) =V3 = /3 ae 
if 2 al i i= 5 
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Substituting these and other Clebsch-Gordan coefficients into the formulas for |J, M), we 
obtain the final results: 


3,3) = p+d42, 
2) = : d ae d 
= 3 70 +2 3Pt +5 
: dya+ : d. ie d 
1s) +2 1570 + 5 Pt 0» 


—_ 





2 
iM pod+2 + a p+dy. 


ww | 
wl 











1 1 1 
d ie dy. ie do, 
a p-4+2 60 a 5 P+a 
a d : di + : d 
5P- +2 — 10° + 107+ 0- 
The reader may verify that states of the same M but different J have the required orthog- 
onality. It is also easy to check that all these |/, M) states are normalized. | 








Exercises 

16.2.1 Derive recursion relations for Clebsch-Gordan coefficients. Use them to calculate 
C(1J | mjm2M) for J =0, 1, 2. 

Hint. Use the known matrix elements of Jy. = J, 4 + Jo, Jj4, and J? = (J; + Jo)’, ete. 

16.2.2 Defining (Y; aa by the formula 

(Vix)! = 2 CUZ S| mums M)¥im Xm, 
where x+1/2 are the spin up and down eigenfunctions of 03 = o;, show that (Y; aed is 
a J, M eigenfunction. 

16.2.3 Find the (j,m) states of a p electron (J = 1), in which the orbital angular momen- 
tum of the electron is coupled to its spin angular momentum (s = 1/2) to form states 
whose conventional labelings are 2 Pi/2 and > 13 /2. The notation is of the general form 
at) (symbol);, where “symbol” is that indicating the / value (i.e., s, p, ...). 

16.2.4 Repeat Exercise 16.2.3 for / = 1, s = 3/2. Apply the conventional labels to the j,m 
states. 

16.2.5 A deuterium atom consists of a proton, a neutron, and an electron. Each of these par- 


ticles has spin 1/2. The coupling of these three spins can produce J values of 3/2 and 
1/2. We consider here only states with no orbital angular momentum. 


(a) Show that these J, M states consist of one quartet (J = 3/2) and two linearly 
independent doublets (J = 1/2). 
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Hint. Make a vector-model diagram. 


(b) One way to analyze this problem is to couple the spins of the proton and neutron 
to form a nuclear triplet or singlet, and then to couple the resultant nuclear spin 
to the electron spin. Find the states that are obtained in this way (designate the 
single-particle states py, Pg, Na, NB, ea, ef): 


(c) Another way to analyze this problem is to couple the spins of the proton and elec- 
tron to form an atomic triplet or singlet, and then to couple that resultant to the 
neutron spin. Find the states that result from this coupling scheme. 


(d) Show that the coupling schemes of parts (b) and (c) span the same Hilbert space. 


Note. The actual interaction energies among these angular momenta cause the scheme 
of part (b) to be the better way of treating this problem (the triplet nuclear state is 
substantially the more stable), and the system actually looks like a spin-1 deuterium 
nucleus plus an electron. 


SPHERICAL TENSORS 


We have already seen that the set of spherical harmonics of given / transforms within itself 
under rotations. We now pursue this idea more formally. In Chapter 3 we saw that rota- 
tions could be characterized by the 3 x 3 unitary transformation matrices that transform a 
set of coordinates (their basis) into the new set corresponding to the rotation. These matri- 
ces could be viewed as second-rank tensors, but because they are restricted to rotational 
transformations, they are also known as spherical tensors. 

We now wish to consider spherical tensors that transform more general sets of objects 
under rotation, and in particular those spherical tensors that have spherical harmonics as 
bases. Our new spherical tensors will then have dimensions other than 3 x 3; in fact, they 
must exist at all the sizes that correspond to sets of angular momentum eigenfunctions. 
Because we have already observed that a set of angular momentum eigenfunctions of a 
given J cannot be decomposed into subsets that transform only among themselves under 
rotation, we go one step further and call our spherical tensors irreducible. 

Continuing for general angular momentum eigenfunctions |L, M), which we assume are 
representable in 3-D space as spherical harmonics or objects built from them by angular 
momentum coupling, we write the following defining equation for the spherical tensor 
describing the effect of a coordinate rotation R on |L, M): 


RIL, M) = > Dipy (RIL, M’). (16.51) 
M’ 


If the |L, M) are actually spherical harmonics (and not more complicated objects that 
resulted from angular momentum coupling), Eq. (16.51) can also be written as 


¥"(RQ) = )~ DI, (R)¥i" (Q). (16.52) 
m' 

Because we do not need to become embroiled in the details of the action of R on the 

coordinates, we have simply replaced (0, p) by the generic symbol Q and have written 

RQ to indicate the coordinates (6’, y’) that describe the point that was labeled (0, ~) in 


the unrotated system. For any given /, Din, (R) can be regarded as an element of a square 
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matrix of dimension 2/ + 1 with rows and columns labeled by indices m’ and m whose 
ranges are (—/,...,+/), not the more customary sequence starting from 1. The De (R) 
are unitary, since they describe a transformation between two orthonormal sets. Because 
of their early exploitation by Eugene Wigner, they are sometimes called Wigner matrices. 
There is an extensive literature (see Additional Readings) on relationships satisfied by the 
Dia. (R) and on formulas for their evaluation. A related topic included in this book is the 
formula, Eq. (3.37), giving the transformation of the basis x, y, z by a rotation through 
Euler angles a, 8, y. 


Addition Theorem 


Equation (16.52) can be used to establish important rotational invariance properties. For 
example, consider a quantity A defined as 


A=) YP (Qi)*¥"(Q2), (16.53) 


where Q; and Q2 are two unrelated sets of angular coordinates. We apply a rotation R to 
the coordinate system, denoting the result RA, and evaluating the right-hand side using 
Eq. (16.52): 


RA=)> (x Dim (R)Y}'( ») (r Dimi P9Y)(@)) (16.54) 
lu v 


m 


We now reorder the summations in Eq. (16.54), and, in the second line of Eq. (16.55), 
use the fact that D is unitary to change D* to the transpose of D~', thereby leading to the 
simplification in the third line. We have 


RA=)) (x DR) Pi) YH (Q4)*¥} (Qo) 


pv m 


= » > [PR], [®)],,) VF (Q4)*¥P (Qo) 


m 


=> biw¥ Qi YQ) =>" PPD) M2) =A. (16.55) 
mv Lu 
This shows that A is rotationally invariant, and is the starting point for an explanation of 
why a totally occupied atomic subshell (particles occupying all m values for a given /) 
leads to a spherically symmetric overall distribution. 

The rotational invariance of A makes it easier for us to actually evaluate it, because 
we can choose to do so at a coordinate orientation for which the computation is relatively 
simple. Let’s rotate the coordinates to place (2; in the polar direction (so now 6; = 0), 
and the 6 value of (22 in the rotated coordinates will be equal to the angle x between the 
Qy and QQ» directions, which is not affected by a coordinate rotation. In this new set of 
coordinates, ¥/”(Q1) is Y/" (0, g) and is given, according to Eq. (15.148), as 


{21+ 1 
Y¥7"(Q)) = aq Om: 
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The summation in Eq. (16.53) therefore reduces to its m = 0 term, and the only Q2 contri- 
bution we need is Va X, 92). But because m = 0, this Y does not actually depend on g2, 
and has the unambiguous value, from Eq. (15.137), 


1 





YP (x, 92) = Pi(cos x). 


i+ 
4a 
These results enable us to obtain 





21+1 
A= a 
4 


which, because of the rotational invariance, remains true whether or not the coordinate sys- 
tem was rotated. Inserting the original formula for A, and solving Eq. (16.56) for P;(cos x), 
we obtain the spherical harmonic addition theorem, 


Pi(cos x), (16.56) 


4 
Pi(cos x) = Ti y-¥y"(@1)*¥/"(2), (16.57) 
m 


where x is the angle between the directions Q; and Q2. 


Example 16.3.1. = ANGLE BETWEEN Two VECTORS 


A useful special case of the addition theorem is for / = 1, for which P; (cos x) = cos x. 
Then, writing Q; = 0;, y;, and evaluating all the spherical harmonics on the right-hand side 
of Eq. (16.57), we have 


1 . * ; 
cos x = 5 (sind,e-"*") (sind,e~'”2) + cos 61 cos 62 


1 ‘ * ; 
+ (- sinde'*") (- sin6e"”) 


2 
1 , . 
= cos 6} cos 62 + 5 sin 6; sin 2 (<@r-v + vee), (16.58) 
This reduces to the standard formula for the angle x between directions (6;, 1) and 
(62, 92): 
cos x = cos 9] cos 62 + sin 9] sin 62 cos(g2 — ¢1). (16.59) 
| 


Spherical Wave Expansion 


An important application of the addition theorem is the spherical wave expansion, which 


states 
love) 1 
kT Sdn SO i! filer) ¥" (Ox) Y}" (Q,) (16.60) 
1=0 m=-l1 
lore) 1 
= 47 > a ig Cea) eugtes 9) cage ery ue (16.61) 


1=0 m=-—1 
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Here k and r are the magnitudes of k and r, and Q,;, &, denote their respective angu- 
lar coordinates. The two forms shown are equivalent because a change in the sign of m 
changes each harmonic to its complex conjugate (possibly with both harmonics under- 
going a sign change). The quantity j)(kr) is a spherical Bessel function. This formula is 
particularly useful because it expresses the plane wave on its left-hand side as a series 
of spherical waves. This conversion is useful in scattering problems in which a plane 
wave, incident upon a scattering center, produces outgoing spherical waves with different 
spherical-harmonic (called partial-wave) components. 

To establish Eq. (16.61), we write k- r as kr cos x, where x is the angle between k and 
r, and then expand exp(ikr cos x) as a series of Legendre polynomials: 


[oe] 
gt = Secon), (16.62) 
1=0 


with the coefficients c; given by 


1 
Of az. 
a= et D(t)dt. (16.63) 
—1 


We now recognize the integral in Eq. (16.63) as proportional to an integral representation 
of j; that was the topic of Exercise 15.2.26 and which we repeat here: 


1 


‘=f 
j= > / el! P(t)dt. (16.64) 
“4 


This permits us to evaluate c;, obtaining 
cp = (21 + Vi! jr(kr). 


Inserting this expression for c; into Eq. (16.62) and replacing P;(cos x ) in that equation by 
its equivalent as given by the addition theorem, Eq. (16.57), we have the desired verifica- 
tion of Eq. (16.61). 


Laplace Spherical Harmonic Expansion 


Another application of the addition theorem is to the Laplace expansion, where in 
Chapter 15 we found that the inverse distance between points r; and r2 could be expanded 
in Legendre polynomials: 


foe) 1 
1 be 
y ET Pi(cos x). (16.65) 


lr) — r2| 1=0 '> 


Here r; and rz are measured from a common origin, with respective magnitudes r; and 
r2; x 1s the angle between r; and rz. We define r. and r= as, respectively, the larger and 
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the smaller of r; and rz. If we now insert the addition theorem, we bring this expansion to 





the form 
1 o. An rt ; - — 
mon) Lemar aT De VIMY} Gr), (16.66) 
1=0 > m=! 


where (2; and Q2 are the angular coordinates of r; and r2 in a coordinate system of arbi- 
trary orientation. 


Example 16.3.2 — SPHERICAL GREEN’S FUNCTION 


An explicit expansion of the Green’s function for the 3-D Laplace equation may be 
obtained by considering its defining equation 
6(Q)—-Q 
ViG(ri, 42) =d(ri — na 
1 


(16.67) 


where we have written V; to remind the reader that it acts only on r;. Also, note that on the 

right-hand side the factor 1/ re is inserted to adjust the angular delta function to unit scale; 

it could equally well have been written 1/ te because of the presence also of 5(r1 — r2). 
We now insert into Eq. (16.67), the following general expansion for G(r, r2): 


Gri. 42) = D> YP girmm’(r1 72) ¥f" (Qa) ¥7"(Q2)*, 
Im I'm’ 
and the expansion of Exercise 16.3.9 for the angular delta function: 
5(Q1 — 22) = YP)" (Qa)*. 
Im 
We also write the Laplacian in the form 
P28 


Ve= 
1 , 
ar; r) Or{ a 





where L; operates only on functions of Q). 

We next take scalar products of the resulting expanded equation with all possible spher- 
ical harmonics of both Q) and Qo, in addition taking note that Y;"(Q21) is an eigenfunction 
of L? with eigenvalue /(/ + 1). We find that many terms cancel, so the scalar products lead, 
for each / and m, to the following result: 


@? 2d 
—5 + —=— -10 +1) | ari, 72) = 8071 — 12). (16.68) 
dry ori dr 


We have collapsed the original four indices of gjmm/(r1, 72) into the single index / 
because all instances of Eq. (16.68) with / Al’ orm £m’ vanish, and g has the same value 
for all m. 

Equation (16.68) is for each / an ODE which, with boundary conditions g = 0 atr =0 
and r = oo, defines the spherical Green’s functions we identified in Section 10.2. Since 





16.3 Spherical Tensors 801 


I-1 


the homogeneous equation corresponding to Eq. (16.68) has solutions r’ and r~/~!, its 
Green’s function must have the form 
2 
g(r1,12) = Alzqz (16.69) 
rs 


with A; = —1/(2/ + 1), a result that can be obtained by application of Eq. (10.19). 
Comparing Eq. (16.66) with the result for G(r;, r2) obtained by using Eq. (16.69), we 

now have yet another way of verifying the result that is familiar from Coulomb’s law: 

1 


CO 


ee 42+ 1 y ait >> YP" (21)"¥7" (Qa) (16.70) 
—_ o_o 16.71 
4m [ry — 42] a 
a 
General Multipoles 


We are now ready to return to the multipole expansion. Given a set of charges q; at respec- 
tive points r;, all located within a sphere of radius a centered at the origin of a spherical 
polar coordinate system, we now consider the calculation of the electrostatic potential 
w(r) at points outside the sphere, i.e., at points r such that r > a. Our starting point is 
the Laplace expansion of 1/|r; — r2| in the form presented as Eq. (16.66). Since for all 
r; we have r; <r, we can write 


1 = Age 9 
WO= Frag dt Lagi y ¥/" 6, 91)" ¥;" (0, 9) 
i 





y* "(0, 9) 
“aad, ee ea | Da ne iF coe : (16.72) 


We see that this substitution has caused the entire effect of the charges q; to be localized 
into the expressions 


Aor ] 
M? = em dX gir Yi" (6;, gi)", (16.73) 


so that the potential due to the q;, for points farther from r = 0 than all the charges, assumes 
the compact form, 





1 mi O,9) 
ieee 3 > Ma (16.74) 


1=0 m=—I 


vr) = 


Equation (16.74) is called the multipole expansion, and the M,” are known as the multi- 
pole moments of the charge distribution. At this point we note that different authors define 
the multipole moments with different scalings, making up the difference by the inclusion 
of an appropriate factor in their formulas correponding to Eq. (16.74). One reason for the 
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variety of notations is that M;” as defined in Eq. (16.73), which leads to the simplest for- 
mulas, does not yield the low-order moments at their “traditional” scalings. For example, 
the monopole moment, M ® evaluates to (47r)!/2 times the total charge, while M ie the 
z-component of the dipole moment, comes out as (471 /3)!/? oj GiZi- 

Of more fundamental interest is the relation between the multipole moments and the 
Cartesian forms that can represent them. We proceed by considering the M;” that result 
from a unit charge placed at (x, y, z). Using the Cartesian representations of the spherical 
harmonics given in Table 15.4, the first few M;” have the forms given here: 


1/2 
M3 = (33) (x2 _ y? + 2ixy) 





/2 . 1/2 ; 
Mi=-(2%) "@tiy) Mj=-(35) cetiy) 
1/2 i/29,? 4? 2 
mado? wea()"2 ate (YF = 
_ 1/2 1 
M,'= (22) 5) z(x — iy) 


The first point to note is that for any / value, the Cartesian representation of each M)” 
involves a homogeneous polynomial of combined degree / in x, y, and z. It is obviously 
necessary that the M7” of different m be linearly independent, and we see that for / = 0 and 
1 = 1, the number of independent monomials is equal to 2/ + 1, the number of m values. 
Specifically, for / = 0 we have only the monomial 1, while for / = 1 we have x, y, and z. 
But for / = 2, there are six independent monomials (x”, y*, 2”, xy, xz, yz), but only five 
values of m. The discrepancy is resolved by observing that one linear combination of these 
monomials, namely r? = x? + y? + 2”, remains invariant under all rotations of the coor- 
dinates, and it therefore has different symmetry properties than the five-dimensional space 
orthogonal to r?. In fact, r? has the same symmetry as MQ, but has the wrong r depen- 
dence to contribute to a solution to the Laplace equation (and therefore to the potential of 
a charge distribution). 

If we were to continue to / = 3, we would find that there are 10 linearly independent 
monomials of degree 3, but they divide into a group of seven functions (the space spanned 
by M3’) with an orthogonal complement (functions orthogonal to the first seven) of dimen- 
sion 3. These three remaining functions have a rotational symmetry similar to Mj", but 
again with the wrong r dependence to contribute to the potential. This type of pattern con- 
tinues to higher /, making logical the observation that a multipole moment of degree / (a 
“2! moment”) has only 2/ + 1 components, despite the fact that in general the space of 
homogeneous polynomials of degree / has a larger dimension. 

The multipole expansion is useful for continuous distributions of charge in addition 
to the discrete charge sets we have considered up to this point. The generalization of 
Eq. (16.73) is 


m 


mF Tai if p(r')(r')' +29" (6', o')* sin6'dr'do'dg’, (16.75) 
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where p(r) is the charge density. This expression will yield valid results when y(r) is 
computed via Eq. (16.74) for r values greater than the largest r’ for which o(r) is nonzero. 


Integrals of Three Spherical Harmonics 


Our final spherical tensor application is to the integrals of three spherical harmonics (all 
of the same argument). These integrals arise in the evaluation of matrix elements of angle- 
dependent operators which themselves can be written in terms of spherical harmonics. 
While it is possible to evaluate some such integrals using the techniques illustrated in 
Eq. (15.152), a more general result is available. Note that this is not an angular momentum 
coupling problem of the type we considered in Section 16.2, because that section treated 
angular momenta with independent arguments that depended on different variables. Here 
we have a different and more specialized situation in which all three angular momentum 
functions have the same argument. 

The formula we seek is most easily derived if we have access to values of some of the 
rotation coefficients 2 (a.k.a. Wigner matrices) defined in Eq. (16.52). The coefficients 
we need can be easily deduced with the aid of the spherical harmonic addition theorem, so 
we start by establishing the following lemma (a lemma is a mathematical result needed to 
prove something else): 


Lemma: Evaluation of Di o(R): 
Writing first the spherical harmonic addition theorem, Eq. (16.57), 


4n 
Pi(cos x) = M41 ye ¥/"(Q1)* 7" (Q2), 
m 


where x is the angle between the directions Q; and (22, we replace its left-hand side by 


the equivalent forn 
! COs = Y ,0 ’ 


4a 
¥X.0 = 4 DY" Q*¥"Q). (16.76) 
m 


We now compare this expression with Eq. (16.52), which we write here in a notation 
designed to make the comparison more obvious: 


¥P(RQ2) = > Di,g(R)Y¥;"(Q2). (16.77) 


m 





thereby reaching 


If we select R to be a rotation that converts Q; to the polar direction, then RQ2 will be 
(x, 0); note that Y, P(RQ2) is independent of g so we can set its y coordinate to zero. Thus, 
the comparison of Eqs. (16.76) and (16.77) yields 


4a 
Di,o(R) =a] Tee (16.78) 
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We remind the reader that R is a rotation that converts Q, to the polar direction. 

Equation (16.78) has been derived under the assumption that the quantities being 
rotationally transformed are spherical harmonics (and not more complicated angular- 
momentum functions such as might be obtained via angular-momentum coupling). How- 
ever, it is possible to show that the result generalizes, without change, to any angular 
momentum functions of integer /. | 


We now continue toward the goal of this subsection, namely the evaluation of integrals 
involving three spherical harmonics. The result we seek involves products of spherical 
harmonics with the same argument, but our method of obtaining that result proceeds by 
considering the rotational behavior of an angular momentum coupling formula (i.e., a prod- 
uct involving spherical harmonics of different arguments). So we now look at a special case 
of Eq. (16.45), 


Yn (21) ¥p (22) = YC, 2, LO, 0, OIL, 0), (16.79) 
L 


where |j1, 171; j2,m2) of Eq. (16.45) is the product of spherical harmonics with m, = 
mz =0 shown on the left-hand side of Eq. (16.79); the |J, M) state of Eq. (16.45) is now 
|L,0). We next apply a rotation R to Eq. (16.79), using Eqs. (16.51) and (16.52) to get 


S-Di (RY D2 g RIV" (Q)¥"?(Q2) = Y> C(h, 1b, LO, 0, 0) DE4(R)|L, c+). 


m0 m20 
m\m2 Lio 


(16.80) 


Finally, we convert |L, a) back to the m;, mz basis, using Eq. (16.43): 


Y= Di (RDB (RIV (QV? (Qo) = D> Ci, fa, L10, 0, 0) DF o(R) 


m0 m20 
m\m2 L,o 
x S> CU, Lim, m2, o)¥/"! (Qa) ¥2? (Qo). (16.81) 
mm 


This relatively complicated equation must be satisfied for all values of Q) and Q2, which 
will only be possible if its two sides are equal for each set of m;, mz values. We therefore 
have the set of simpler equations, 


Dp! 


m0 


(RYDE (R)= YC, b, L10, 0, 0)C i, fa, Limi, m2,0)Dy(R), (16.82) 
Lo 


satisfied separately for all values of the free parameters. 

We are now ready to replace all the D! in Eq. (16.82) by the result obtained in our 
lemma, Eq. (16.78). Since the rotation R is arbitrary, both in the lemma and in the present 
work, our use of Eq. (16.78) will produce some angular coordinates Q that have nothing 
to do with the Q; we were previously using; the point that is important here is that because 
the same R occurs throughout Eq. (16.82), the application of Eq. (16.78) will everywhere 
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produce the same Q. Substitution of the lemma result yields 


4n m m 
YI" (Q)*¥7?(Q)* = 
JQh+DG@h+ " eae 








An 


lj, lo, L l,l, L\m1, mz, 
D/C, fa, 10,0, 0)CUi, fo, Llmi.m2, oy) 7 


Lo 


YE (@)*. 


Since the Y;” are the only potentially complex quantities appearing here, we may remove 
the complex conjugate signs by complex conjugating the entire equation. After other 
minor rearrangements and recognition of the fact that the only contributing o value is 
o =m, +z, we reach the final form 


Hire iictiasig Me | CAPD OL aD 








x C(li, 12, L|0, 0, 0)C(h, la, Limi, mz, my +m) ¥7"*"(Q). (16.83) 


At last we can meet the objective of this subsection. Multiplying both sides of 
Eq. (16.83) by some i (Q)* and integrating in Q over the angular space, we get 














20 1 
(ve [an |te)= fae f sinedore eo. oreo 
0 0 
(24, + 1) (2h +1) 
_ Ct Hi 0. OCG. Bini. vor 
4x (2L +1) (yb, | \C (Ly, ly, 1g], m2, m3) ( ) 


We do not have to include a Kronecker delta because the condition m3 =m, +mz is taken 
care of by the fact that the Clebsch-Gordan coefficients vanish in the absence of this or any 
other condition needed for a nonzero result. 

Some further insight can be obtained by considering the special case m, = m2 = m3 = 0 
and writing the spherical harmonics in terms of Legendre polynomials. This brings us (after 
the substitution t = cos @) to 


1 


/ Pi, (t) Ph, (t) Ph (dt = a Giiy, In, 13|0, 0, 0). (16.85) 
213 +1 
=1 

Since we know that the Legendre polynomial P;(t) of even / is an even function of t, while 
that of odd / is odd in f, we see from Eq. (16.85) that unless /; + /2 + /3 is even, the integral 
will vanish, telling us that C(/, /2, 13|0, 0, 0) will only be nonzero if J; + 2 + 13 is even. 
In addition, if the product of any two of the P;(t) does not contain a power of t as large as 
the index of the third P;, the integral will vanish due to the orthogonality of the Legendre 
functions. This observation translates into a triangle condition, namely that the integral 
will vanish unless |/; —/2| < 13 <1; +l. Since these are conditions on the Clebsch-Gordan 
coefficient C (/,, l7, 1/3|0, 0, 0), they apply also to the general integral formula, Eq. (16.84). 

Summarizing, integrals of products of three spherical harmonics, evaluated in 
Eq. (16.84), will only be nonzero if the three following conditions are satisfied: 
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1. Thel values satisfy the triangle condition \l, — Ip| < 13 <l, +h, 
2. Them values satisfy the condition m3 =m, +m2, 
3. The sum of the | values, |, + lz + 1s, is even. 


Exercises 


16.3.1 For / = 1, Eq. (16.52) becomes 


1 
Y"@',e')= D> Dim (ct, B Y)YI" 6,9). 
m'=—-1 
Rewrite these spherical harmonics in Cartesian form. Show that the resulting Cartesian 
coordinate equations are equivalent to the Euler rotation matrix A(q, 8, y), Eq. (3.37). 


16.3.2 In proving the addition theorem, we assumed that by (81, ~1) could be expanded in a 
series of ¥/" (62, 2), in which m varied from —/ to +/ but / was held fixed. What argu- 
ments can you develop to justify summing only over the upper index, m, and not over 
the lower index, /? 


Hints. One possibility is to examine the homogeneity of the Y;”, that is, Y/” may be 
expressed entirely in terms of the form cos’~? sin? 6, or x!~P-‘y?z/r', Another 
possibility is to examine the behavior of the Legendre equation under rotation of the 
coordinate system. 


16.3.3. Anatomic electron with angular momentum / and magnetic quantum number m has a 
wave function 


wr, 0, g) = fO)Y;" 6, g). 
Show that the sum of the electron densities in a given complete shell is spherically 
symmetric; that is, oe w*(r, 0, ~) w(r, 6, @) is independent of 6 and gy. 
16.3.4 — The potential of an electron at point r, in the field of Z protons at points rp, is 


Z 
e 1 


Ameg = lte—Tpl 





1 
Show that for r, larger than all rp, this may be written as 


= e 2 lp p 4m Mig se Mg 
ale = > Gen Eat Gn ¥o) Yr (6c, Pe): 


eor, YT, 
OMe y=1L,M e 





How should ® be written for re < rp? 


16.3.5 Two protons are uniformly distributed within the same spherical volume. If the coor- 
dinates of one element of charge are (r1, 01, ¢1) and the coordinates of the other are 
(r2, 62, ~2) and rj2 is the distance between them, the element of repulsion energy will 
be given by 





dt dt r2 dr, sin0,d0\dgr> dr sin@) dO, dg¢r 
dys =p” =p 2 


r12 r\2 





16.3.6 


16.3.7 


16.3.8 
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where 


charge 3e 
= g = 5 and ce = re + an — 2rir2cosy. 
volume 47R 








Here p is the charge density and y is the angle between r; and rp. Calculate the 
total electrostatic energy (of repulsion) of the two protons. This calculation is used in 
accounting for the mass difference in “mirror” nuclei, such as O!° and NP. 

6 e7 
5k 
Each of the two 1s electrons in helium may be described by a hydrogenic wave function 


1/2 

3 

roe (4) o2rie 
Tag 


in the absence of the other electron. Here Z, the atomic number, is 2. The symbol ao 
is the Bohr radius, h?/me*. Find the mutual potential energy of the two electrons, 
given by 


ANS. 


2 


‘f wr)" (2) ——wer Wr) Bry Brn. 


lr, —Yo| 
5e°Z 


ANS. . 
8a0 





The probability of finding a 1s hydrogen electron in a volume element r7dr sin@ dé d¢ is 
1 
—,e~*"/r dr sind do dg, 
Ty 


where r is the distance of the electron from the nucleus. Find the electrostatic potential 
of this charge distribution at points r;, where you may not assume that r) is on the polar 
axis of your coordinate system. Calculate the potential from 


Vay=- 2 / PD) 13, 


Ai €9 r\2 








where rj2 = |r| — r2|. Expand rj2. Apply the Legendre polynomial addition theorem 
and show that the angular dependence of V (11) drops out. 


e 1 2r\ 1 2ry 
ANS. V(r1)= y | 3, +—TI{ 2, , where 
Ameo | 2r1 ao ao ao 


y and I are incomplete gamma functions, Eq. (13.73). 





A hydrogen electron in a 2p: orbital has a charge distribution 


r2e—'/% sin? 6, 





0 banal 


where ap = h*/me? is the Bohr radius, and r is the distance between the electron and 
the nucleus. Find the electrostatic potential energy for this atomic state. 
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16.3.9 


16.3.10 


16.3.11 


(a) Asa Laplace series and as an example of Eq. (5.27), show that 


lee) I 
5(Q1 — 2) = D> > VP" Or, G2)*¥/" (01, G1)- 
1=0 m=—1 


(b) Show also that this same representation of the Dirac delta function may be 





written as 
Cc 
2i+1 
5(Q1 - 22) = > Pi(cosy), 
1=0 iv 


and identify y. Now, if you can justify equating the summations over / term by term, 
you have an alternate derivation of the spherical harmonic addition theorem. 


Verify 


1 
(a) [rte.ove. g)Y/"*(0,p)dQ= Te 


(b) [ritriyiyac = a 








4n (2L + 1)QL +3) 











L+M+1)\(L+M+4+2 
(c) petrbagiran- mf +M+1)(L+M +2) 


8x (2L+1)2L+3) ” 








(d) frtriighaas hee 


80 QL —1)QL +1) 


These integrals were used in an investigation of the angular correlation of internal 
conversion electrons. 


Show that 
Eas 
QL ane nial 
(a) / xP (x) Py(x)dx = 7 
= N=L-1, 


Orphen. 
2(L + 1)(L + 2) 








’ N=L+42, 
(2L + D)(2L+3)(2L +5) 
2 
aie ps: 2(2L? +2L — 1) 2 
(b) J: L(x) Py (x)dx OL—DOQL4DaL+3) is 
7 91 =A) : 





CL 3CL=DOLzy: 
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16.3.12 Since x P,(x) is a polynomial (of degree n + 1), it may be represented by the Legendre 


16.4 


series 


Xx Py(x) = Sods Ps (x). 


s=0 


(a) Show that a; =0 fors<n—lands>n+l. 


(b) Calculate aj—1, dy, and a,+1 and show that you have reproduced the recurrence 
relation, Eq. (15.18). 


Note. This argument may be put in a general form to demonstrate the existence of a 
three-term recurrence relation for any of our complete sets of orthogonal polynomials: 


XQn = An+1Gn+1 + AnGn + An—1Pn-1- 


VECTOR SPHERICAL HARMONICS 


Maxwell’s equations lead naturally to applications involving a vector Helmholtz equation 
for the vector potential A, and various classical and quantum-mechanical problems in this 
area are usefully attacked by introducing vector spherical harmonics. Our first step in 
this direction will be to recognize that a set of unit vectors can be thought of as a spherical 
tensor of rank | and can be discussed in terms of the angular momentum formalism. We 
will later (in Chapter 17) pursue rotational symmetry in greater depth; for our present 
purposes it suffices to confirm the relationship between rotations in 3-D space and angular 
momentum operators. 


A Spherical Tensor 


We consider here vectors in 3-D space, of the form u = u,€, + uyéy + uz, but, unlike 
our practice in Chapter 3, we will permit the wu; to be complex, and use the complex scalar 
product (uju)!/* as a measure of the magnitude of u. If we restrict the vectors u to be of 
unit length, they satisfy the conditions necessary to be identified as spherical tensors of 
rank 1. 

We now introduce operators K; defined by the following matrices: 


00 0 O° 0 4 Or27° 70 
Ky =).0: 0 KGS 0 20s Oy RR GOP Oe “C16 86) 
07 0 = 0) <4) 0 0 0 


The reader can easily verify that these matrices satisfy the angular momentum commu- 
tation rules, and in fact describe the result of applying the angular momentum operator 


L=r x p, where p = —iV, to the basis x, y, z. We next calculate 
1 0 0 

K? = K?+K34+K7=2[/0 1 O], 
00 1 
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showing that all members of the basis are eigenvectors of K*, with eigenvalue 2, which is 
k(k + 1) with k = 1. All members of our basis therefore have one unit of some abstract sort 
of angular momentum (often referred to as spin), and we can obtain a set of eigenvectors 
with values of an index m that can have values +1, 0, and —1. By diagonalizing the matrix 
K3, we find its eigenvectors to be 


=i 0 1/2 
kj=| -i//2], ko=]O], ki={ -i/ 2 }. (16.87) 
0 1 0 


While in principle the signs of these eigenvectors are arbitrary, they have been chosen here 
to agree with the Condon-Shortley phase convention. 


Vector Coupling 


The vector spherical harmonics are now defined as the quantities that result from the cou- 
pling of ordinary spherical harmonics and the vectors e,, to form states of definite J (the 
resultant of the orbital angular momentum of the spherical harmonic and the one unit pos- 
sessed by the e,,). It is customary to label the vector spherical harmonics to show both 
the L value from the ordinary (scalar) harmonic and the M value (the eigenvalue of J-). 
Thus, the vector spherical harmonic will have three indices: J, L, and M. From the general 
formula for angular-momentum coupling, Eq. (16.43), we have 


Yyim (0,9) = )_C(L, 1, Jjmm!M)Y?" 0, @)ew. (16.88) 


mm! 


Remember that M is M7, not the m value of Y;", and that é,,” are the angular momentum 
eigenfunctions given in Eq. (16.87). 

Because Eq. (16.88) couples an angular momentum L with one of magnitude k = 1, the 
L values in a vector spherical harmonic of given J are restricted to J + 1, J, and J — 1, 
a condition enforced by the values of the Clebsch-Gordan coefficients. Moreover, because 
the Clebsch-Gordan coefficients describe a unitary transformation, the obvious orthogo- 
nality of the states in the m, m’ basis (Y/"€’) will cause the vector spherical harmonics 
also to be orthonormal: 


i Yiim (6,9) - Yorum ©, p)dQ = 5775.1 5ym. (16.89) 
In addition, we can invert Eq. (16.89) using Eq. (16.45), reaching 


Yi", ~)6m' = >) C(L, 1, J(mm' MY yim. (16.90) 
JM 


The manipulation of expressions involving the vector spherical harmonics depends 
crucially on a few identities, of which perhaps the most important is the formula 





tY" (6,9) = amu a + e ey (16.91) 
L@UHA= IL 41 L,L+1,M aL +1 L,L—1,M- : 
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To establish this formula, and at the same time to make its meaning more obvious, we start 
by noting that F has a form that depends on the angular coordinates; specifically, it is 


ee r 
=< 





sin 6 cos yé, + sind singe, + cos Oé, 


For our present purposes, it is more convenient to rearrange this to the form 





ae (“& —e!8,) 
r= sin 
/2 
It is now clear that in order to prove Eq. (16.91) we must show that each é,, has the same 


coefficient on both sides of the equation. Taking first the coefficient of @9, the left-hand 
side of Eq. (16.91) yields, after use of Eq. (16.92), 


) + cos 0. (16.92) 





b= Hd nr 
cos }"(6,9) =| ‘ m+1)¢+m+ ym 


(21 + 1)(21 + 3) i 


(—m)(l+m) ]? 
la pores | yi", 


(Qi — 1 Qi+1) (16.93) 


a result previously exhibited as Eq. (15.150). The 9 terms from the right-hand side of 
Eq. (16.91) consist of 





Eee 


1/2 
M A 
Et | C(L+1,1,L|M,0, M)Y_,€ 


1/2 
| C(h— 1,1, LI 0, a bo. 
+o] ( | )¥p-180 


The Clebsch-Gordan coefficients appearing here have the values 





L+M+4+1\(L—M+41)]!2 
C(L+1,1, L|M,0, M) = E eiiealee = 4 , 


(L+ 1)QL +3) 


L? — M2 ) 


L—1,1, L|M,0, M) = | ————— 
C(L~1,1,L|M,0, M) Fear 


These data permit confirmation of the eg terms of Eq. (16.91). The terms in e+; and e_; 
can also be shown consistent; the formulas needed for that purpose are Eqs. (15.151) and 
(15.152). 

Another useful formula, which can be obtained by using Eq. (16.91) to simplify the 
radial component when the gradient operator is applied to the form f(r)Y rd (6, @), is 





L+1]'*7a L 
v| fore @.9)| = Eaal E =| FOWL. 0) 


L V?fa . £41 . : oa 
[a] lat . Fi L,L-1,M (9, 9). (16.94) 
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Under coordinate inversion the vector spherical harmonics transform as 
Yi r4i.m0',¢') =(-D**"Yz 141m, 9), 
Yi 1-1.M 6,9) =(-D**1Y_7 1-1. (6,9), (16.95) 
Yiim(',¢') =(-D*Yiim@, 9). 





where 
V=n-O0 g=n+49Q. 


Starting from Eqs. (16.91) and (16.94), a number of formulas can be derived for the 
divergence and curl of vector spherical harmonics. These formulas include the following: 


Lai] ’fa E42 
| [oes hn] Yew. 








V-[ PO¥i141,.m0.9)| =— 











2L+1 dr 
(16.96) 
M2Tdfr) L-1 ig 
V-[fOY1-1.M6.9)] = Fes cs ; ror (6,9), (16.97) 
V-[ FOY¥iiw6.9)] =0 (16.98) 
W20g L+2 
x | FOV a.m 6,0)] = (= [es =? p09] Yum, (16.99) 
W20g L 
x | SOYiim@, 9] = (= =) ae =r] Yi.1+1.M (0.0), 





ae Eat]? df), LA 
| oE+1 dr r 


F00| Y71-1,M, 
(16.100) 





E41 if Ee bie 


Vx [FOV 11-1.m@,9)] =i ij ih 


1 
F09| Yiim(9, 9). 
(16.101) 


For a complete derivation of Eqs. (16.96) to (16.101) we refer to the literature.> These 
relations play an important role in the partial wave expansion of classical and quantum 
electrodynamics. 

The definitions of the vector spherical harmonics given here are dictated by convenience, 
primarily in quantum mechanical calculations, in which the angular momentum is a sig- 
nificant parameter. Further examples of the usefulness and power of the vector spherical 
harmonics will be found in Blatt and Weisskopf, in Morse and Feshbach, and in Jackson 
(all in Additional Readings). 

In closing, we note that 


3E. H. Hill, Theory of vector spherical harmonics, Am. J. Phys. 22: 211 (1954). Note that Hill assigns phases in accordance with 
the Condon-Shortley phase convention. In Hill’s notation, Xp y =YzLm, Vim = YL,L+1,M> Wim = YL,L-1,M- 





Exercises 


16.4.1 


16.4.2 


16.4.3 
16.4.4 


16.4.5 
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Vector spherical harmonics are developed from coupling L units of orbital angular 
momentum and one unit of spin angular momentum. 


An extension, coupling L units of orbital angular momentum and two units of spin 
angular momentum to form tensor spherical harmonics, is presented by Mathews.* 


The major application of tensor spherical harmonics is in the investigation of gravita- 
tional radiation. 





Construct the / = 0, m = 0 and / = 1, m = 0 vector spherical harmonics. 
ANS. Yo19 = —F(421)~1/? 

Yoo0 = 0 

Y 120 = —F(277)~!/2 cos — 6(82)~!/? sino 

Y110 = Gi1(3/87)!/? sind 

Y 100 = F(42r)~"/2 cos — 6(4sr)~!/? sind. 
Verify that the parity of Yz741 is (—1)£+!, that of Yi, is (—1)“, and that of 
Yz1-1m is (—1)£t!. What happened to the M-dependence of the parity? 
Hint. F and @ have odd parity; 6 has even parity (compare with Exercise 3.10.25). 
Verify the orthonormality of the vector spherical harmonics Y 77 ,. 
Jackson’s Classical Electrodynamics (see Additional Readings) defines Yz7 by the 


equation 


Yim (6,9) = LY @,9), 


1 
VEL FI 
in which the angular momentum operator L is given by 

L=-—i(rx V). 
Show that this definition agrees with Eq. (16.88). 


Show that 


e 2L+1 


> Yim 9.) Yiim (6, 9) = . 
M=-L 4n 





Hint. One way is to use Exercise 16.4.4 with L expanded in Cartesian coordinates and 
to apply raising and lowering operators. 


4]. Mathews, Gravitational multipole radiation, J. Soc. Ind. Appl. Math. 10: 768 (1963). 
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16.4.6 Show that 
i Yirw- @ x ¥ipm)d2=0. 


The integrand represents an interference term in electromagnetic radiation that contributes 
to angular distributions but not to total intensity. 


Additional Readings 


Biedenharn, L. C., and J. D. Louck, Angular Momentum in Quantum Physics: Theory and Application. Ency- 
clopedia of Mathematics and Its Applications, vol. 8. Reading, MA: Addison-Wesley (1981). An extremely 
detailed account, containing much material not easily found elsewhere. 


Blatt, J. M., and V. Weisskopf, Theoretical Nuclear Physics. New York: Wiley (1952). Treats vector spherical 
harmonics. 


Brink, D. M., and G. R. Satchler, Angular Momentum. New York: Oxford (1993). Contains a good presentation 
of graphical methods for the manipulation of 37, 67, and even 9j symbols. The 6j and 97 symbols are useful 
in dealing with the coupling of more than two angular momenta. 


Condon, E. U., and G. H. Shortley, Theory of Atomic Spectra. Cambridge: Cambridge University Press (1935). 
This is the original and standard work on spin-orbit coupling in atomic states. It is extremely thorough and not 
for the beginner. 


Edmonds, A. R., Angular Momentum in Quantum Mechanics. Princeton, NJ: Princeton University Press (1957). 
A good introductory text, with detailed discussion of the symmetries of 37, 67, and 97 symbols. 


Jackson, J. D., Classical Electrodynamics, 3rd ed. New York: Wiley (1999). Applies vector spherical harmonics 
to multipole radiation and related problems. 


Morse, P. M., and H. Feshbach, Methods of Theoretical Physics, 2 vols. New York: McGraw-Hill (1953). 
Includes material on vector spherical harmonics. 


Rose, M. E., Elementary Theory of Angular Momentum. New York: Wiley (1957), reprinted, Dover (1995). As 
part of the development of the quantum theory of angular momentum, Rose includes a detailed and readable 
account of the rotation group. 


Wigner, E. P., Group Theory and Its Application to the Quantum Mechanics of Atomic Spectra (translated by 
J. J. Griffin). New York: Academic Press (1959). This is the classic reference on group theory for the physi- 
cist. The rotation group is treated in considerable detail. There is a wealth of applications to atomic physics. 
The translation from the original German edition included a conversion from a left-handed to a right-handed 
coordinate system. This conversion introduced a few errors that can be resolved by comparison with the 
untranslated book. 
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CHAPTER 17 


GROUP THEORY 


Disciplined judgment, about what is neat 
and symmetrical and elegant, has time and 
time again proved an excellent guide to 
how nature works. 


MURRAY GELL-MANN 


INTRODUCTION TO GROUP THEORY 


Symmetry has long been important in the study of physical systems. Connections between 
the geometric symmetry of crystalline systems and their x-ray diffraction spectra were 
found to be crucial to the interpretation of the diffraction patterns and the extraction 
therefrom of information locating the atoms in the crystal. The geometric symmetries of 
molecules determine which vibrational modes will be active in absorbing or emitting radi- 
ation; the symmetries of periodic systems have implications as to their energy bands, their 
ability to conduct electricity, and even their superconductivity. The invariance of physi- 
cal laws with respect to position or orientation (i.e., the symmetry of space) gives rise to 
conservation laws for linear and angular momentum. Sometimes the implications of sym- 
metry invariance are far more complicated or sophisticated than might at first be supposed; 
the invariance of the forces predicted by electromagnetic theory when measurements are 
made in observation frames moving uniformly at different speeds (inertial frames) was 
an important clue leading Einstein to the discovery of special relativity. With the advent of 
quantum mechanics, considerations of angular momentum and spin introduced new sym- 
metry concepts into physics. These ideas have since catalyzed the modern development of 
particle theory. 

Central to all these symmetry notions is the fact that complete sets of symmetry oper- 
ations form what in mathematics are known as groups. The elements of a group may be 
finite in number, in which case the group is then termed finite or discrete, as for example 
the symmetry operations shown for the object depicted in Fig. 17.2. But alternatively, the 
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symmetry operations may be infinite in number and described by continuously variable 
parameter(s); such groups are termed continuous. An example of a continuous group is 
the set of possible rotational displacements of a circular object about its axis (in which 
case the parameter is the rotation angle). 


Definition of a Group 


A group G is defined as a set of objects or operations (e.g., rotations or other transfor- 
mations), called the elements of G, that may be combined, by a procedure to be called 
multiplication and denoted by *, to form a well-defined product, subject to the following 
four conditions: 


1. Ifa and b are any two elements of G, then the product a « b is also an element of 
G; more formally, a * b associates an element of G with the ordered pair (a, b) of 
elements of G. In other words, G is closed under multiplication of its own elements. 

2. This multiplication is associative: (a * b)*c=ax(b*c). 

3. There is a unique identity element! J in G, such that J * a =a* I =a for every 
element a in G. 

4. Each element a of G has an inverse, denoted a, such thatax a7! =a! *«a=1. 


The above simple rules have a number of direct consequences, including the following: 


e It can be shown that the inverse of any element a is unique: If a~! and @~! are both 
inverses of a, then @~! =~! « (axa7!) = (47! xa) xa! =a", 

e The products g «a, where a is fixed and g ranges over all elements of the group, consist 
(in some order) of all the elements of the group. If g and g’ produce the same element, 
then g xa = g’ xa. Multiplying on the right by a~|, we get (ga) *a~! = (g’ xa) xa7!, 
which reduces to g = g’. 


Here are some useful conventions and further definitions: 


e The * for multiplication is tedious to write; when no ambiguity will result it is custom- 
ary to drop it, and instead of a « b we write ab. 


e When a and D are operations, and ab is to be applied to an object appearing to their 
right, b is deemed to act first, with a then applied to the result of operation with b. 


e Ifa discrete group possesses n elements (including /), its order is n; a continuous 
group of order 7 has elements that are defined by n parameters. 


e Ifab=ba for all a, b of G, the multiplication is commutative, and the group is called 
abelian. 


e Ifa group possesses an element a such that the sequence J, a, a7(= aa), a, --- 
includes all elements of the group, it is termed cyclic. If a group is cyclic, it must 
also be abelian. However, not all abelian groups are cyclic. 





' Following E. Wigner, the identity element of a group is often labeled E, from the German Einheit, that is, unit; some other 
authors just write 1. 
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e Two groups {/,a,b,---} and {I’, a’, b’,---} are isomorphic if their elements can be 
put into one-to-one correspondence such that for all a and b, ab=c => a'b' =c' If 
the correspondence is many-to-one, the groups are homomorphic. 


e Ifasubset G’ of G is closed under the multiplication defined for G, it is also a group 
and called a subgroup of G. The identity 7 of G always forms a subgroup of G. 


Examples of Groups 


Example 17.1.1 D3, Symmetry OF AN EQUILATERAL TRIANGLE 


The symmetry operations of an equilateral triangle form a finite group with six elements; 
our triangle can be placed either side up, and with any vertex in the top position. The six 
operations that convert the initial orientation into symmetry equivalents are J (the identity 
operation that makes no orientation change), C3, an operation which rotates the triangle 
counterclockwise by 1/3 of a revolution, CG (two successive C3 operations), C2, rotation 
by 1/2 a revolution (for this group the rotation is about an axis in the plane of the trian- 
gle), and C} and C4 (180° rotations about additional axes in the plane of the triangle). 
Figure 17.1 is a schematic diagram indicating these symmetry operations, and Fig. 17.2 
shows their result, with the vertices of the triangle numbered to show the effect of each 
operation. The multiplication table for the group is shown in Table 17.1, where the prod- 
uct ab (which describes the result of first applying operation b, and then operation a) is 














FIGURE 17.1 Diagram identifying symmetry operations of an equilateral triangle. J is 
the identity operation (the diagram as shown here). C3 and c are counterclockwise 
rotations, by, respectively, 120° and 240°; C2, C}, C4 are operations that turn the triangle 
over by rotation about the indicated axes. 
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FIGURE 17.2 Result of applying the symmetry operations identified in Fig. 17.1 to 
an equilateral triangle. One side of the triangle is shaded to make it obvious when that 
side is up. 


Table 17.1 Multiplication Table for Group D3 








I C3 o C2 c Gy 
I I C3 c Cp c cy 
C3 C3 Cc I c Cy ras 
ce ce I C3 G Ce C2 
C2 C2 ro Cy i C3 ce 
Cc a C2 ce I C3 
ra cy C2 C, C3 ce I 








Operations are pictured in Fig. 17.2. The table entry for row a and column b 
is the product element ab. For example, C2C3 = C5. 


the group element listed in row a and column b of the table. This group has several names, 
of which one is D3 (“D” for dihedral, referring to a 180° rotation axis lying in a plane 
perpendicular to the main symmetry axis). From the multiplication table or by examination 
of the symmetry operations themselves, we can see that the inverse of J is J, the inverse 
of C3 is ce (so the inverse of Gc is C3), and each C> is its own inverse. This group is not 
abelian; C3C2 4 C2C3 (C3C2 = CY, while C2C3 = C)). | 


Example 17.1.2 ROTATION OFA CIRCULAR Disk 


The rotations of a circular disk about its symmetry axis form a continuous group of order | 
whose elements consist of rotations through angles gy. The group elements R(¢) are infinite 
in number, with g any angle in the range (0, 277). The identity element is clearly R(0); 
the inverse of R(v) is R(2z — gy). The multiplication rule for this group is R(v)R(@) = 
R(g+8) (reduced to a value between 0 and 277), so R(g)R(@) = R(@)R(g), and this group 
is abelian. It will be useful to figure out what happens to a point on the disk that before the 
rotation was at (x, y). The rotation is by an angle g about the z axis, clockwise, looking 
down from positive z, a choice made to be consistent with the counterclockwise rotations 
of the coordinate axes used elsewhere in this book. The final location of this point, (x’, y’), 
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is given by the matrix equation 


/ cos sin 
ert re *) (3). (17.1) 
y —sing cos@ y 


Example 17.1.3 AN AsstRACT GROUP 


Groups do not need to represent geometric operations. Consider a set of four quantities 
(elements) 7, A, B, C, with our knowledge about them only that when any two are multi- 
plied, the result is an element of the set. The multiplication table of this four-element set 
is shown in Table 17.2. These elements form a group, because each has an inverse (itself), 
there is an identity element (/), and the set is closed under multiplication. 


Table 17.2 Multiplication Table 
for the Vierergruppe 








I A B Cc 
I I A B Cc 
A A I Cc B 
B B Cc I A 
Cc Cc B A I 








The table entry for row a and column bd is the 
product element ab. | 


Example 17.1.4 ISOMORPHISM AND HOMOMORPHISM: C4 GROUP 


The symmetry operations of a square that cannot be turned over form a four-membered 
group sometimes called C4 whose elements can be named J, C4 (90° rotation), C2 (180° 
rotation), C4 (270° rotation). The four complex numbers 1, i, —1, —i also form a group 
when the group operation is ordinary multiplication. These groups are isomorphic, and can 
be put into correspondence in two different ways: 








Tol, Moi, Co<—-l, Cho ior lel, (yo -i, Coe -l, Choi. 


This group is also cyclic, as C} = C2, Cj = C4, or equivalently i = —1, i? = —i. 

The group C4 has a two-to-one correspondence with the ordinary multiplicative group 
containing only 1 and —1: J and C2 < 1, while C4 and Ci < —1. This is a homomor- 
phism. A more trivial homomorphism, possessed by all groups, is obtained when every 
element is assigned to correspond to the identity. | 
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Exercises 


17.1.1 


17.1.2 


17.1.3 


17.1.4 


17.1.5 


17.1.6 


17.1.7 


The Vierergruppe (German: four-membered group) is a group different from the C4 
group introduced in Example 17.1.4. The Vierergruppe has the multiplication table 
shown in Table 17.2. Determine whether this group is cyclic and whether it is abelian. 


(a) Show that the permutations of n distinct objects satisfy the group postulates. 


(b) Construct the multiplication table for the permutations of three objects, giving 
each permutation a name of some sort. (Suggestion: Use J for the permutation 
that leaves the order unchanged.) 


(c) Show that this permutation group (named $3) is isomorphic with D3 and identify 
corresponding operations. Is your identification unique? 


Rearrangement theorem: Given a group of distinct elements (J,a,b,...,n), show 
that the set of products (al, a*, ab, ac,..., an) reproduces all the group elements in a 
new order. 


A group G has a subgroup H with elements h;. Let x be a fixed element of the original 
group G and not a member of H. The transform 


xhjx~', i=1,2,... 


generates a conjugate subgroup x Hx~!. Show that this conjugate subgroup satisfies 
each of the four group postulates and therefore is a group. 


(a) A particular group is abelian. A second group is created by replacing g; by o 
for each element in the original group. Show that the two groups are isomorphic. 
Note. This means showing that if ab = c, thena~!b7! =c7!. 


(b) Continuing part (a), show that the second group is also abelian. 


Consider a cubic crystal consisting of identical atoms at r= (la, ma,na), with /,m, 
and n taking on all integral values. 


(a) Show that each Cartesian axis is a fourfold symmetry axis. 


(b) The cubic point group will consist of all operations (rotations, reflections, inver- 
sion) that leave the simple cubic crystal invariant and that do not move the atom at 
l1=m=n=0. From a consideration of the permutation of the positive and nega- 
tive coordinate axes, predict how many elements this cubic group will contain. 


A plane is covered with regular hexagons, as shown in Fig. 17.3. 


(a) Determine the rotational symmetry of an axis perpendicular to the plane through 
the common vertex of three hexagons (A). That is, if the axis has n-fold symmetry, 
show (with careful explanation) what n is. 


(b) Repeat part (a) for an axis perpendicular to the plane through the geometric center 
of one hexagon (B). 


(c) Find all the different kinds of axes within the plane of hexagons about which a 
180° rotation is a symmetry element (this corresponds to turning the plane over by 
rotation about that axis). 
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FIGURE 17.3 Plane covered by hexagons. 


17.2 REPRESENTATION OF GROUPS 


All discrete groups and the continuous groups we study here can be represented by square 
matrices. By this we mean that to each element of the group we can associate a matrix, 
and that if U(a) is the matrix associated with a and U(b) the matrix associated with b, 
then the matrix product U(a)U(b) will be the matrix associated with ab. In other words, 
the matrices have the same multiplication table as the group. We call these matrices U 
because they can be chosen to be unitary. It is not necessary that U have a dimension equal 
to the order of the group. 

Sometimes we need to identify representations with a label. For specific representations 
we can use their generally adopted names; when we need a generic label, we will use K or 
K’'. Thus, we can refer to representation K, consisting of matrices U*(a). 


Example 17.2.1 = A UNITARY REPRESENTATION 


Here is a unitary representation of the group D3 illustrated in Fig. 17.2: 


1 0 at Eal8 
D= = 2 2 
U(1) e ') U(C3) ia : | 
-$ -3Vv3 0 
uc =( os ) vic ( ) 
3-3 0-1 


1 


=i 1 _1 
U(C)) = ( 2 ) U(Ch) = ( 2 ) (17.2) 
av3 5 V3 3 


Several features of this representation are apparent: 


Nie 
lon i 
(es) 


e The unit operation is represented by a unit matrix. 


e The inverse of an operation is represented by the inverse of its matrix. 
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We can check that the U form a representation: From the multiplication table, we have 
C2 C3 = C). Now we evaluate 


1 0 ai 1/3 =f 1/3 
Wan 6 (ar > 3 = ee ) 
— f a ) ee 4 


which is indeed U(C%). The reader can easily verify that other products of group elements 
correspond to the products of the representation matrices. Matrix multiplication is in gen- 
eral not commutative, and gives results that are consistent with the lack of commutativity 
of the group operations. 

The 2 x 2 representation shown above is faithful, meaning that each group element cor- 
responds to a different matrix. In other words, our 2 x 2 representation is isomorphic with 
the original group. Not all representations are faithful; consider the relatively trivial repre- 
sentation in which every group element is represented by the 1 x 1 matrix (1). Every group 
will possess this representation. A somewhat less trivial, but still unfaithful, representation 
of D3 is one in which 


U1) = U(C3) = U(C3) =1, U(c) =U) =U(C):) ==1. (17.3) 








This representation distinguishes elements according to whether they involve turning the 
triangle over. Not all groups will possess this | x 1 representation; if we had not permit- 
ted the triangle to be turned over, this representation would have been excluded. These 
unfaithful representations are homomorphic with the original group. a 


An important feature of a representation of a group G is that its essential fea- 
tures are invariant if we make the same unitary transformation on the matrices repre- 
senting all the group elements. To see this, consider what happens when we replace 
each U(g) by VU(g)V~!. Then the product U(g)U(g’), which is some U(g”), becomes 
(VU(g)V~!)(VU(g’)V~!) = VU(g)U(g’)V~! = VU(g”)V~| so the transformed matrices 
still form a representation of G. Representations that can be transformed into each other 
by application of a unitary transformation are termed equivalent. 

The possibility of unitary transformation also enables us to consider whether a repre- 
sentation of G is reducible. An irreducible representation of G is defined as one that 
cannot be broken into a direct sum of representations of smaller dimension by application 
of the same unitary transformation to all members of the representation. What we mean by 
a direct sum of representations is that each matrix will be block diagonal (all with the same 
sequence of blocks). Since different blocks will not mix under matrix multiplication, cor- 
responding blocks of the representation members will themselves define representations 
(see Fig. 17.4). If a representation named K is a direct sum of smaller representations K 
and K>, that fact can be indicated by the notation 


K=K,@ k2. 


It is not always obvious whether a representation is reducible. We will shortly encounter 
theorems that provide (for discrete groups) ways of determining what irreducible repre- 
sentations are present in a representation that may be reducible. Moreover, if a group is 
abelian, then the fact that all its elements commute means that the matrices represent- 
ing them can all be diagonalized simultaneously. From that fact we can conclude that all 
irreducible representations of abelian groups are 1 x 1. 
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FiGURE 17.4 A member of a reducible representation in direct-sum form. All members 
will have the same block structure, so individual blocks define representations of smaller 
dimension. 


It is important to understand that reducibility implies the existence of a unitary trans- 
formation that brings all members of a representation to the same block-diagonal form; a 
reducible representation may not exhibit the block-diagonal form if it has not been sub- 
jected to a suitable unitary transformation. Here is an example illustrating that point. 


Example 17.2.2 


A REDUCIBLE REPRESENTATION 


Here is a reducible representation for our equilateral triangle: 
1 0 0 0 1 0 00 1 
UDn={0 1 Of, UCc3=]0 0 1], UCH={1 0 Of, 
00 1 1 0 0 0 1 0 


U(C2) = 


- OO 
oro 


1 
o}, UC)= 
0 


SO = 


0 0 O01 0 
oO 1}, Ucy={1 0 0}. 74) 
1 0 001 


Note that some of these matrices are not in any direct-sum form. To show that the repre- 
sentation of Eq. (17.4) is reducible, we transform all the U to U’ = VUV™ : using 


W/V3 1/V3— V3 
V=| 1/46 —/273 1/V6 |, 
W/Vv2 0 -1/72 
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which brings us to 


0 0 1 0 0 
U'(1) = | 0 0], U(c3)=]0 -5 4V3], 
1 1 
0 0 1 @ clys. a 
1 0 0 1 0 
U(C3)=10 -3 -gv3], U()=]0 1 
0 5V3 -3 0 0 - 
1 0 0 1 0 0 
Uicyy=]0 -5 4V3], UCZ=]0 -} -5v3]. 9 (175) 
0 5v3 5 0-373 5 


All the matrices of this representation are block diagonal, and are direct sums that consist 
of an upper 1 x 1 block that is the trivial representation, all of whose elements are (1), 
and a lower 2 x 2 block that is exactly the 2 x 2 representation illustrated in Eq. (17.2). 
There exists no unitary transformation that will simultaneously reduce the 2 x 2 blocks of 
all members of the representation to direct sums of 1 x 1 blocks, so we have reduced the 
representation of Eq. (17.4) to its irreducible components.” a 


Example 17.2.3 REPRESENTATIONS OF A CONTINUOUS GROUP 


Example 17.1.2 presented a continuous group of order | whose elements are rotations R(¢) 
about the symmetry axis of a circular disk. These rotations were taken to be defined by 
the matrix equation presented as Eq. (17.1). The 2 x 2 matrix in that equation can also be 
viewed as a representation of R(g): 


cosy sing 
U@g=| |. : 
—sing cosg 


Because this group is abelian (two successive rotations yield the same result if applied 
in either order), we know that this representation is reducible. If we apply the unitary 
transformation 


iff? -i//2 
U'(y) =VU)V', with V= 
ip/2 ip/2 
the result is 
cosg +ising 0 el? 0 
U(g) = = ( (17.6) 
0 cosy — i sing 0 el? 


2We know this because some of these 2 x 2 matrices do not commute with each other and therefore cannot be diagonalized 
simultaneously. 
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Equation (17.6) applies to every element of our rotation group after transforming with V, 
and we see that every rotation now has a diagonal representation. In other words, U(g) has 
been transformed into a direct sum of two one-dimensional (1-D) representations, U’! = 
Ui @ Un, with Uj (g) = e'? and U_1)(g) =e”. In fact, these are only two of an 
infinite number of irreducible representations, all of dimension 1: 


Un(g) =e”, 


where n can have any positive or negative integer value, including zero. The reason n is 
limited to integer values is to assure that U(277) = U(0). Note that only the n values +1 
lead to faithful representations. a 


Exercises 


17.2.1 


17.2.2 


17.2.3 


17.2.4 


17.2.5 


17.2.6 


For any representation K of a group, and for any group element a, show that 
=] 
[ UX] = UK (a7). 


Show that these four matrices form a representation of the Vierergruppe, whose multi- 
plication table is in Table 17.2. 


=(5 ft) A= CaS) B=(P 2). c(t) 


Show that the matrices 1, A,B, and C of Exercise 17.2.2 are reducible. Reduce them. 


Note. This means transforming B and C to diagonal form (by the same unitary transfor- 
mation). 


(a) Once you have a matrix representation of any group, a 1-D representation can be 
obtained by taking the determinants of the matrices. Show that the multiplicative 
relations are preserved in this determinant representation. 


(b) Use determinants to obtain a 1-D representation of D3 from the 2 x 2 representa- 
tion in Eq. (17.2). 

Show that the cyclic group of n objects, C,, may be represented by r”, m = 

0,1,2,...,2—1. Here r is a generator given by 


r=exp(27is/n). 


The parameter s takes on the values s = 1, 2,3,...,, with each value of s yielding a 
different 1-D (irreducible) representation of C,,. 


Develop the irreducible 2 x 2 matrix representation of the group of rotations (including 
those that turn it over) that transform a square into itself. Give the group multiplication 
table. 


Note. This group has the name Dy (see Fig. 17.5). 
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FIGURE 17.5 D4 symmetry group. 


17.3. SYMMETRY AND PHYSICS 


Representations of groups provide a key connection between group theory and the sym- 
metry properties of physical systems. Our discussion will be directed mainly at quantum 
systems, but much of it will also apply to systems that can be described using classical 
physics. 

Consider a quantum system whose Hamiltonian H possesses certain geometric sym- 
metries. If we write H = T + V, the symmetries will be those of the potential energy 
V, since the kinetic energy operator T is invariant with respect to rotations and displace- 
ments of the coordinate axes. A concrete example that illustrates the concept would be the 
determination of the wave function of an electron in the presence of nuclei in some fixed 
configuration possessing symmetry, such as the equilibrium locations of the nuclei in a 
symmetric molecule. 

The symmetry of H corresponds to a requirement that H be invariant with respect to 
the application of any element of its symmetry group. Letting R denote such a symmetry 
element, the invariance of H means that if ¢ is a solution of the Schrédinger equation with 
energy FE, then Rg must also be a solution with the same energy eigenvalue: 


(Re) = E(R¢). 


By successively applying the elements of our symmetry group to ~, we can generate a 
set of eigenfunctions, all with the same eigenvalue. If g happened to have the full sym- 
metry of H, this set would contain only one member and the situation would be easy to 
understand. But if g had less symmetry,* our eigenfunction set would have more than one 
member, with its maximum possible size being the number of elements in our symmetry 
group. When the eigenfunction set has more than one member, the eigenfunctions do not 
individually have the complete symmetry of the Hamiltonian, but they form a closed set 
that permits the partial symmetry to be expressed in all symmetry-equivalent ways. For 
example, the hydrogenic eigenfunctions known as p states form a three-membered set; 





3 This is possible; an example is a hydrogen-atom p state. 
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none has the full spherical symmetry of the hydrogen atom Hamiltonian, but linear com- 
binations of the three p states can describe a p orbital at an arbitrary orientation (obvious 
because a vector in an arbitrary direction can be written as a linear combination of vectors 
in the coordinate directions). 

So let’s assume that, starting from some chosen ¢, we have found a full set of symmetry- 
related eigenfunctions, have eliminated from them any linear dependence, and have formed 
an orthonormal eigenfunction set, denoted gj,i=1...N. 

Because of the way in which the g; were constructed, they will transform linearly among 
themselves if we apply to them any operation R from our symmetry group, so we may write 


Roi = Y_Uji(R)9;- (77) 
Jj 


If we apply two symmetry operations (R followed by S), the transformation rule for the 
result will be 


SRgi = Y_ Ugj(S) Uji(R) GE. (17.8) 
jk 


Equations (17.7) and (17.8) show that the transformation for the group element SR is the 
matrix product of those for S and R, so the matrices U(S) and U(R) have properties that 
make them members of a representation of our symmetry group. What is new here is that 
we have identified U as a representation associated with the basis {¢;}. 

At this point we do not know whether the representation formed from our {g;} basis is 
reducible; its reducibility depends on the quantum system under study and the particular 
choice made for the initial function g. If our U are reducible, let’s assume we now apply 
a transformation that will convert them into the direct-sum form. The transformation to 
obtain the direct-sum separation corresponds to a division of the basis into smaller sets of 
functions that transform only among themselves. Our overall conclusion from the above 
analysis is: 

Ifa Hamiltonian H is fully symmetric under the operations of a symmetry group, 
all its eigenfunctions can be classified into sets, with each set forming a basis for 
an irreducible representation of the symmetry group. The members of a symmetry- 
related set of eigenfunctions will be degenerate and are referred to as a multi- 
plet. Ordinarily different multiplets will correspond to different eigenvalues; any 
degeneracy between eigenfunctions of different irreducible representations arises 
from sources other than the symmetry under study. 


Because the eigenfunctions of a Hamiltonian possessing geometric symmetry can be 
identified with irreducible representations of its symmetry group, it is natural to use 
approximate eigenfunctions with similar symmetry restrictions. 


Example 17.3.1. = AN EVEN HAMILTONIAN 


Consider a Hamiltonian H(x), which is even in x, meaning that H(—x) = H(x), but has 
no other symmetry. Letting o stand for the reflection operator x + —x (o is the usual 
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notation for a reflection operation), our symmetry group, called C,;, consists only of the 
two operations 7 and o, and its multiplication table is 


Il=oo=I, lo=ol=o. 


This group is abelian, and has two irreducible representations of dimension 1: one (A1) that 
is completely symmetric, U() = U(a) = 1, and one (A2) with sign alternation, U(/) = 1, 
U(o) = —1. The eigenfunctions of H will therefore be even or odd, and there is no inherent 
symmetry requirement that even and odd states be degenerate with each other. 

If we start with a function g(x) that is even, we will have [yg = oy = g, so our basis 
will consist only of g, and U1) = U(o) = 1, indicating that the representation constructed 
using this basis will be the fully symmetric Aj. 

On the other hand, if our starting function g(x) was odd, then [yg = g but og = —9; 
again our basis will consist only of g(x), but now the representation constructed from it 
will consist of U1) = 1, U(o) = —1, and will be the alternating-sign representation Ao. 

But if we start with a function g(x) that is neither even nor odd, then g(x) = g(x), 
but og(x) = g(—x). Our assumption that g(x) is neither even nor odd means that g(x) 
and g(— x) are linearly independent, so our basis will consist of two members (and there- 
fore be of dimension 2). Since the symmetry group has only A, and A? as irreducible 
representations, the representation built from our two-membered basis will be reducible, 
and will reduce to A; @ Az. The basis will separate into the two members g(x) + y(—x) 
(a 1-D A, basis) and g(x) — g(—x) (an A? basis). 

Given a problem with an even Hamiltonian, one may use the above-identified symmetry 
analysis to seach for solutions that are restricted to have either even or odd symmetry. This 
strategy may greatly simplify the process of finding solutions. The notion can be extended 
to problems with different or greater degrees of symmetry. a 


It is important to note that all geometric symmetry groups (other than the trivial group, 
which has only the element J) will possess representations other than A;, which means 
that they will have bases of less symmetry than the original group. In Example 17.3.1, our 
Hamiltonian was even, but could have eigenfunctions that are either even (A,) or odd (Az). 
A Hamiltonian with D3 symmetry (which we have already seen has irreducible represen- 
tations of dimensions | and 2) can have A; eigenfunctions of the full three-dimensional 
(3-D) symmetry or A» eigenfunctions with alternating-sign symmetry. It can also have sets 
of two degenerate eigenfunctions corresponding to the representation in Eq. (17.2), where 
(as indicated by the 2 x 2 matrices) the symmetry operations can convert either of the 
basis members into linear combinations of both. The irreducibility means that there exists 
no single function built from this two-member basis that will remain the same (except for 
a possible sign or phase factor) under all the group operations. The existence of an irre- 
ducible basis with more than one member is a consequence of the fact that the symmetry 
group is not abelian. 

Although the elements of a symmetry group may not all commute with each other, they 
all commute with a Hamiltonian (or other operator) having the full group symmetry. To 
show this, note that for any eigenfunction w and any group element R, 


Hw = Ew — H(RW) = E(RW) = R(EW) = RH — HR= RH. 


The last step follows because the previous steps are valid for all members of a complete 
set of eigenfunctions y. 





17.3 Symmetry and Physics 829 


Sometimes, especially for continuous groups, we will know in advance how to construct 
bases for irreducible representations. For example, the spherical harmonics of a given / 
value form a basis for representation of the 3-D rotation group. From Chapter 16, we know 
that these spherical harmonics form a closed set under rotation, but only if the set includes 
all m values. This information, together with the orthonormality of the Y;”, tells us that Y;”, 
m= -—lI,...,1 is an orthonormal basis of dimension 2/ + 1 for an irreducible representation 
of the 3-D rotation group, which is named SO(3). In contrast to the situation for discrete 
groups, continuous groups (even of low order) may possess an infinite number of finite- 
dimensional irreducible representations. 

An experienced investigator can often find bases for irreducible representations by 
inspection or educated insight. However, if simple methods for finding a basis prove insuf- 
ficient, general methods can be used to construct basis functions if the matrices defining 
the relevant irreducible representation are available. Details of the process can be found in 
the works by Falicov, Hamermesh, and Tinkham (see Additional Readings). 


Example 17.3.2 | QUANTUM MECHANICS, TRIANGULAR SYMMETRY 


Let’s consider a Hamiltonian that has the D3 symmetry of an equilateral triangle that can 
be turned over, and our problem is such that its solution can be approximated as a wave 
function that is distributed over orbitals centered at the three vertices R; of the triangle, of 
the form w(r) = a1 ¢(r1) + a2Q(r2) + a3¢(r3), where r; is the distance |r — R;|, and ¢ is 
a spherically symmetric orbital. The function 


Yo = 9(r1) + P(r2) + 9(73) 
is a basis for the trivial (A) representation of the D3 group. But because we have three 
orbitals, there will be two other linear combinations of them that are linearly independent 
of yo, and one way to choose them is 


1 
Wi = —[¢1) -¢(73)], w= — g(r1) + 29(r2) — 9(73)]. 


1 

Ji a 
Neither of these functions (nor any linear combination of them) has enough symmetry to 
be either A; or Az basis functions, and they therefore must (together) form a basis for a 
2 x 2 irreducible representation of the D3 symmetry group that is called E. Knowing that 
this would be the case, we chose these functions in a way that makes them orthogonal and 
at a consistent normalization, and they are in fact a basis for the irreducible representation 
given in Eq. (17.2). 

We can check this by applying group operations to 7 and W, verifying that the result 
corresponds to the appropriate column of the matrix for the operation. We make one such 
check here: Applying C3 to w1, we get C3W = [g(r3) — v(r2)]/V2, while the first column 
of U(C3) in Eq. (17.2) yields 


ly V3 y _ 5, (ee) J3 (eee) 
a2 v2 2 V6 | 


2 2 
The reader can verify that these two expressions for C3 are equal, and can make further 
checks if desired. 


C3y1 = 
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One might think that because of the triangular symmetry there would be an irreducible 
representation of dimension 3. But mathematics is not that simple; all D3 representations 
of dimension 3 are reducible! |_| 


The symmetries required of solutions to Schrédinger equations have implications 
beyond their role in causing or explaining degeneracy. The dominant interaction between 
an electromagnetic field and a molecule can occur only if the molecule has an electric 
dipole moment, and the presence of a dipole moment depends on the symmetry of the 
electronic wave function. Another context in which symmetry is important is in the evalu- 
ation of the expectation values of quantum operators. These expectation values will vanish 
unless the integrals that define them have integrands with a fully symmetric part. In addi- 
tion, it is worth mentioning that many quantum calculations are simplified by limiting them 
to contributions that do not vanish by reason of symmetry. All these issues can be framed 
in terms of the irreducible representations for which our wave functions are bases. 

In the next sections, we develop some key results of group representation theory, first 
for discrete groups because the analysis is simpler, and then (in less detail) for continuous 
groups that have become important in particle theory and relativity. 


Exercises 


17.3.1 


17.4 


Consider a quantum mechanics problem with D3 symmetry, with the threefold symme- 
try axis taken as the z direction, and with orbitals g(r — R;) located at the vertices of 
an equilateral triangle. This is the same system geometry as in Example 17.3.2, but in 
the present problem ¢ will no longer be chosen to have spherical symmetry. 


Given that g(r) = (z/r) f(r) (so g has the symmetry of a p orbital oriented along the 
symmetry axis), construct linear combinations of the g that are bases for irreducible 
representations of D3, for each basis indicating its representation. 


DISCRETE GROUPS 


Classes 


It has been found useful to divide the elements of a finite group G into sets called classes. 
Starting from a group element aj, one can apply similarity transformations of the form 
ga,g~!, where g can be any member of G. If we let a, be transformed in this way, using 
all the elements g of G, the result will be a set of elements that we can denote a),..., ax, 
where k may or may not be larger than 1. Certainly this set will include a itself, as that 
result is obtained when g = / and also when g = a; or g = ay. The set of elements 
obtained in this way is called a class of G, and can be identified by specifying one of its 
members. If we choose a; = J, we find that / is in a class all by itself; often classes will 
have larger numbers of members. 

A class will have the same members no matter which of its elements is assigned the role 
of a;. This is clear, since if aj = ga,g~! then also a, = g~!a;g, showing that we can get 
a, from any other element of the class, and therefrom all the elements reachable from a. 
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Example 17.4.1 Ciasses OF THE TRIANGULAR GROUP D3 


As observed already in general, one class of D3 will consist solely of /. The class including 
C3 contains also C3 (the result of C2C3C; !) Finally, Co, C}, and C4 constitute a third 
class. | 


Classes are important because: 


e Fora given representation (whether or not reducible), all matrices of the same class will 
have the same value of their trace—obvious because trace(gag~!) = trace(ag~!g) = 
trace(a). In the group theory world, the trace is also known as the character, custom- 
arily identified with the symbol I’. 


e It can be shown that the number of inequivalent irreducible representations of a finite 
group is equal to its number of classes. (For proof and fuller discussion, see Additional 
Readings at the end of this chapter.) 


It can be shown (again, see Additional Readings) that the set of characters for all elements 
and irreducible representations of a finite group defines an orthogonal finite-dimensional 
vector space. Writing P(g) as the character of group element g in irreducible representa- 
tion K, we have the key relations, for a group of order n: 


ng) TA(g)PX(g')=ndey, Y TX(g)P*(g) =ndxxi- (17.9) 
K 8 
Here ny is the number of elements in the class containing g. These relations enable any 
reducible representation to be decomposed into a direct sum of irreducible representations, 
and can also be of aid in finding the characters of irreducible representations if they were 
not already known. 
Another theorem of great importance in the theory of finite groups, sometimes called 
the dimensionality theorem, is that the sum of the squares of the dimensions nx of the 
inequivalent irreducible representations is equal to the order, n, of the group: 


ink =n. (17.10) 
K 


This theorem, together with the theorem that the number of irreducible K equals the num- 
ber of classes, imposes stringent limits on the number and size of the irreducible represen- 
tations of a group. These two requirements are often enough to determine completely the 
inventory of irreducible representations. 

Since the finite groups of interest in physics have been well studied, the most frequent 
use of these orthogonality relations is to extract from a basis that may be reducible (i.e., 
a basis for a possibly reducible representation) the irreducible bases that may be included 
therein. This task is usually carried out with a table of irreducible representations at hand. 


Example 17.4.2 | OrtHOGONALITY RELATIONS, GRouP D3 


The usual scheme for tabulating discrete group characters is called a character table; that 
for our triangle group D3 is shown in Table 17.3. The rows of the table are labeled with 
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Table 17.3. Character Table for Group D3 




















I 2C3 3C2 
A 1 1 
A2 1 1 —l 
E 2 —1 
w 3 0 1 


Each row corresponds to an irreducible representation, and each 
column corresponds to a class. The table entry is the character for 
each element of that irreducible representation and class. The row 
below the boxed table (labeled W) is not part of the table but is used 
in connection with Example 17.4.4. 


the usual names assigned the irreducible representations: The labels A and B (the latter not 
used for this group) are reserved for 1 x 1 representations. Representations of dimension 
2 are normally assigned a label FE, and those of dimension 3 (also not occurring here) are 
called T. Each column of the character table is labeled with a typical member of the class, 
preceded by a number indicating the number of group elements in the class. This number 
is omitted if the class contains only one element. 

Because the representation of group element / is a unit matrix, the characters (traces) 
in column / directly indicate the dimensions of the representations. We see that A, is a 
1 x 1 representation, so each A; matrix contains a single number equal to the character 
shown, meaning that Aj is the trivial totally symmetric representation. We see that A is 
also 1 x 1, but the three group elements for which the triangle was turned over are now 
represented by —1. Finally, representation E is seen to be 2 x 2, and is the representation 
we found long ago in Eq. (17.2). 

Checking the first orthogonality relation for g = g’ = J, for which ng = 1, we have 
1(17+ 12+ 27) =6, as expected. For g = I, g’ = C3, we have 1[1(1) + 1(1) +2(—1)] =0, 
and for g = g’ = C3, we note that ng = 2 and we have 2[12 + 12 + (—1)?] = 6. The reader 
can check other cases of this orthogonality relation. 

Moving to the second orthogonality relation, we take K = K’ = E, finding 1(27) + 
2(—1)* + 3(07) = 6; the 1, 2, and 3 multiplying individual terms allow for the fact that the 
sum is over all elements, not just over classes. Other cases follow similarly. a 


Example 17.4.3 COUNTING IRREDUCIBLE REPRESENTATIONS 


We consider two cases, first the group C4, which was the subject of Example 17.1.4. This 
group is cyclic, with elements J, a, a~, a*; those are all the elements, because a* = J. As 
already indicated, a faithful representation of this group consists of 1, i, —1, —i, with the 
group operation being ordinary multiplication. Another realization of C4 is an object that is 
symmetric under 90° rotation about a single axis. This group is abelian, as a?a? = afa?. 
Then gag~! =a for any group elements a and g, so each element is in a class by itself. So 
we have four classes, and hence four irreducible representations. We also have, from the 
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dimension theorem, 


4 
> nx =4, 
K=1 
The only way to satisfy this equation is to have four irreducible representations, each of 
dimension 1. This result should have been expected, since C4 is abelian. Our irreducible 
representations can be built from the four following choices of U(a): 1, i, —1, —i, leading 
to the following character table. 








I a a a 
A, | 1 1 1 1 
A2 | 1 i —1 —i 
A3]1 —1 1 —1 
‘Aq | 1 —i —1 i 














Our second case is D3, which has six elements and the three classes identified in 
Example 17.4.1. This means that it has three irreducible representations with dimensions 
whose squares add to six. The only set of dimensions satisfying this requirements is 1, 
1, and 2. a 


Example 17.4.2 can be generalized to deal with reducible representations; any repre- 
sentation whose characters do not match any row of the character table must be reducible 
(unless just wrong!). If we were to transform a reducible representation to direct-sum form, 
it would then be obvious that its trace will be the sum of the traces of its blocks, and that 
property will hold even if we do not know how to make the block-diagonalizing trans- 
formation. In group-theory lingo we would say that the characters of a reducible repre- 
sentation will be the sum of the characters of the irreducible representations it contains. 
Note that if a given irreducible representation occurs more than once, its characters must 
be added a corresponding number of times. 

Now suppose that we have a reducible representation Y of a group of order n. Even 
if we do not yet know its decomposition into irreducible components, we can write its 
characters for group elements g in the form 


P(g) = >> cxP*(g), (17.11) 
K 


where cx is the number of times irreducible representation K is contained in W. If we 
multiply both sides of this equation by P* (g) and sum over g, the orthogonality kicks 
in, and 


Deere) = DoD cx P(g) = nex. (17.12) 
g gs K 
Evaluating the left-hand side of Eq. (17.12), we easily solve for cx’. We can repeat this 


sequence of steps with different K’ until all the irreducible representations in VW have been 
found. 





834 Chapter 17 Group Theory 
Example 17.4.4 DECOMPOSING A REDUCIBLE REPRESENTATION 


Suppose we start from the following set of three basis functions for the triangular 
group D3": 


Wx, Way, Ww=v2xy, (17.13) 


where x, y, z are Cartesian coordinates with origin at the center of the triangle, and the axes 


are in the directions shown in Fig. 17.1. Since C3 x = —}x + 5Vv3y, Cy= -5 3x — 


5 y, we can (somewhat tediously) determine that 





3 3 
C3x° = oe [Reva 





1 

4 

4 1 3 
cya date tte Bevin, 


3 3 1 
C3(W2xy) = [ae [ay 5(/2xy), 





so in the y basis, 


1 3 3 
q 4 8 
w — 3 1 3 
U%(C3)=| 3 LS ly (17.14) 
a a eee 
8 8 2 


Similar analysis can be used to obtain the matrix of C2, which is easier because the opera- 
tion involved is just x —> —x, with y remaining unchanged. We get 


10 0 
U“(Cr)=]0 1 Of. (17.15) 
G0=1 


The representation of J is, of course, just the 3 x 3 unit matrix. Since the only data we need 
right now are the traces of one representative of each class, we are ready to proceed, and 
we see that 


rr’) =3, F%(c3)=0, FC) =1. 


We are labeling the characters with superscript Y as a reminder that the representation is 
that associated with the y;. These characters have been appended below their respective 
columns in Table 17.3. 

Now we use the fact that the Y representation must decompose into 


W=c,A, 8A. OGLE, (17.16) 





4These basis functions have been chosen ina way that makes the reducible representation unitary. The factor /2 in 3 is needed 
to make all the yy; at the same scale. 
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and we find the c; by applying Eq. (17.12). Using the data in Table 17.3, and taking K’ to 
be in turn Aj, Ao, and E, 


Ay: (1)(3) +20) (0) +30) C1) = 6 = 6c), soc; = 1, 
Az : (1)(3) +20) (0) + 3(-1) 1) = 0 = 6c2, so cz = 0, 
E : (2)(3)+ 2(—1)(0) + 3(0)(1) = 6 = 6c3, soc3 = 1. 


Thus, YW = A; ® E. We can check our work by summing the A; and E entries from the 
character table. As they must, they add to give the entries for V. a 


For some purposes it is insufficient just to know which irreducible representations are 
included in a reducible basis for a group G. We may also need to know how to transform 
the basis so that each basis member will be associated with a specific irreducible represen- 
tation of G. Sometimes it is easy to see how to do this. For the above example, the basis 
function for A; must have the full group symmetry, while the E basis functions must be 
orthogonal to the A; basis. These considerations lead us to 


Al: g@=WitwW=x +’, (7.17) 
E:g=hW-Wax?-y’, g=vV23=2xy. (17.18) 


However, if finding the irreducible basis functions by inspection proves difficult, there are 
formulas that can be used to find them. See Additional Readings. 


Other Discrete Groups 


Most of the examples we have used have been for one group, D3, in which we have 
considered symmetry operations that involve rotations about axes through the center of 
the system. Groups keeping a central point fixed are called point groups, and they arise, 
among other places, when studying phenomena that depend on the geometric symmetries 
of molecules. Some point groups have additional symmetries associated with inversion or 
reflection. It is possible for a point group to have a single n-fold axis for any positive inte- 
ger n (meaning that a symmetry element is a rotation through an angle 277/n). However, 
the number of point groups having multiple symmetry axes with n > 3 is very limited; 
they correspond to the Platonic regular polyhedra, and therefore can only be tetrahedral, 
cubic/octahedral, and dodecahedral/icosahedral. 

Other discrete groups arise when we consider permutational symmetry; the symmetric 
group is important in many-body physics and is the subject of a separate section of this 
chapter. 


Exercises 


17.4.1 


The Vierergruppe has the multiplication table shown in Table 17.2. 


(a) Divide its elements into classes. 


(b) Using the class information, determine for the Vierergruppe its number of inequiv- 
alent irreducible representations and their dimensions. 


(c) Construct a character table for the Vierergruppe. 
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17.4.2 


17.4.3 


17.4.4 


17.4.5 


The group D3 may be discussed as a permutation group of three objects. Operation 
C3, for instance, moves vertex | to the position formerly occupied by vertex 2; like- 
wise vertex 2 moves to the original position of vertex 3 and vertex 3 moves to the 
original position of vertex 1. So this shuffling could be described as the permutation of 
(1,2,3) to (2,3,1). Using now letters a, b, c to avoid notational confusion, this permuta- 
tion (abc) — (bca) corresponds to the matrix equation 


a 0 1 0 a b 
C3} b),=]0 0 1 bj=)|c], 
Cc 1 0 0 Cc a 


thereby identifying a 3 x 3 representation of the operation C3. 
(a) Develop analogous 3 x 3 representations for the other elements of D3. 


(b) Reduce your 3 x 3 representation to the direct sum of a 1 x 1 and a 2 x 2 repre- 
sentation. Note: This 3 x 3 representation must be reducible or Eq. (17.10) would 
be violated. 


The group named Dg has a fourfold axis of symmetry, and twofold axes in four direc- 
tions perpendicular to the fourfold axis. See Fig. 17.5. D4 has the following classes (the 
numbers preceding the class descriptors indicate the number of elements in the class): 
T,2C4, C2, 2C4, 2C. The twofold axes marked with primes are in the plane of fourfold 
symmetry. 


(a) Find the number and dimensions of the irreducible representations. 





(b) Given that all the characters of the representations of dimension | are +1 and that 
C2= ce use the orthogonality conditions to construct a complete character table 
for Dg. 


The eight functions +x*, +x?y, txy?, +y? form a reducible basis for D4, with C4 


a 90° counterclockwise rotation in the xy plane, C2 = ce OHS =e yy), 
Cy = (x > y, y > x), and the remaining members of D4 are additional members of 
the classes containing the above operations. Find the characters of the reducible repre- 
sentation for which these functions form a basis, and find the direct sum of irreducible 
representations of which it consists. 




















The group C4, has a fourfold symmetry axis in the z direction, reflection symmetries 
(o,) about the xz and yz planes, and additional reflection symmetries (og, d = dihedral) 
about planes that contain the z axis but are 45° from the x and y axes. See Fig. 17.6. 
The character table for C4, follows. 








I 2C4 C2 20, 20g 
Ai {1 1 1 1 1 
Az|1 1 1 -1 -l 
By} 1 -1 1 1 -l 
Bo} 1 -1 1 -1 1 
E |2 0 —2 0 0 
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17.5 
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FIGURE 17.6 C4, symmetry group. At left, a molecule with this symmetry. 
At right, a diagram identifying the reflection planes, which are perpendicular to the 
plane of the diagram. 


(a) Construct the matrix representing one member of each class of C4, using as a 
basis a p, orbital at each of the points (x, y) = (a, 0), (0, a), (—a, 0), (0, —a), and 
therefrom extract the characters of the reducible representation for which these p- 
orbitals form a basis. A p; orbital has functional form (z/r) f(r). 


(b) Determine the irreducible representations contained in our reducible p, represen- 
tation. 


(c) Form those linear combinations of our p, functions that are bases for each of the 
irreducible representations found in part (a). 


Using the notation and geometry of Exercise 17.4.5, repeat that exercise for the eight- 
member basis consisting of a p, and a py orbital at each of the points (x, y) = (a, 0), 
(0, a), (—a, 0), (0, —a). 


DIRECT PRODUCTS 


Many multiparticle quantum-mechanical systems are described using wave functions that 
are products of individual-particle states. This approach is that of an independent-particle 
model, which at a higher degree of approximation can include interparticle interactions. 
The single-particle states can then be chosen to reflect the symmetry of the system, mean- 
ing that each one-particle state will be a basis member of some irreducible representation 
of the system’s symmetry group. This idea is obvious, for example, in atomic structure, 
where we encounter notations such as 1s*2s*2p> (the ground-state electron configuration 
of the N atom). 
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When a multiparticle system with symmetry group G is subjected to one of its sym- 
metry operations, each single-particle factor in its wave function transforms according to 
its individual irreducible representation of G, so the overall wave function may contain 
products of arbitrary components of each particle’s representation. Thus, the multiparticle 
basis consists of all the products that can be formed by taking one member of each single- 
particle basis. This is what is termed a direct product. This multiparticle basis will also 
constitute a representation of G. The notation 


K=K\®k» 


indicates that the representation K of G is the direct product of the representations K; and 
K>. This means also that the representation matrix UX(a) of any element a of G can be 
formed as the direct product (see Eq. 2.55) of the matrices U*'(a) and U*2(a). 

The representation of a group G formed as a direct product of two (or more) of its irre- 
ducible representations may or may not be irreducible. For finite groups, a useful theorem 
is that the characters for a direct product of representations are, for each class, the product 
of the individual characters for that class. Once the characters for the direct product have 
been constructed, the methods of the previous section can be used to find the irreducible 
components of the product states. 


Example 17.5.1 EVEN-ODD SYMMETRY 


Sometimes the analysis of a direct product is simple. Consider a system of n independent 
particles subject to a potential whose only symmetry element (other than /) is inversion 
(denoted i) through the origin of the coordinate system, so V(—r) = V(r). In this case, 
G (conventionally named C;) has the two elements J and i, with the following character 
table. 








I i 
Ag | 1 
Ay | 1 —l 














Individual particles with A, wave functions, which remain unchanged under inversion, 
are conventionally labeled g (from the German word gerade). Particles with Az wave 
functions, which change sign on inversion, are labeled u, for ungerade. In fact, the usual 
notation for the character table of the C; group writes A, and A, in place of A; and 
A2, thereby conveying more information about the symmetries of the corresponding basis 
functions. 

Now suppose that this system is in a state with j of the particles in u states and n — j of 
the particles in g states. Intuitively, we know that if j is an odd number, the overall wave 
function will change sign on inversion, but will not change sign if j is even. Formally, we 
examine the direct product representation K: 


K =u(1) @u(2)@--- Qu @gsU+HDO--- gin). 
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Using the theorem that the characters of representation K can be obtained by multiplying 
those of its constituent factors, we find T(J) = 1, P¥(i) = (—1)/. Irrespective of the 
value of j, K will be irreducible: It is A, if j is even, and A, if j is odd. | 


Example 17.5.2 | Two QuaNTu PARTICLES IN D3 SYMMETRY 


This case is not as simple. Suppose both particles are in states of E symmetry, a situa- 
tion spectroscopists would identify with the notation e*; they use lower-case symbols to 
identify individual-particle states, reserving capital letters for the overall symmetry desig- 
nation. For definiteness, let’s further suppose> that each particle has a wave function of the 
form found in Eq. (17.18), so particle i will have the two-member basis 


vai) = (x? -y?), gp (i) = 2xiyi, 
and the product basis will therefore have the four members 
Daa = Pal lga(2), Pav = Gall) go (2), 


Ppa = Gp (1)Gal2), Pop = Go (1) Gp (2). 


The matters at issue are (1) to find the overall symmetries this system can exhibit, and (2) 
to identify the basis functions for each symmetry. 
Consulting Table 17.3, we compute the products for e ® e: 


IT 2C3 3C2 
e@e: 4 1 0. 


Since this representation has dimension 4 while the largest irreducible representation has 
dimension 2, it must be reducible. Applying the technique of Example 17.4.4, we can find 
that it decomposes into e ® e = A; @ A2 @ E, a result that is easily checked by adding 
entries in the D3 character table. 

A set of basis functions corresponding to the decomposition into irreducible representa- 


(17.19) 


tions are 
WA! = (xt — yd — y3) + 4xr yi 3292, (17.20) 
wy? =2 [si — y)any2 — xy (x2 — yd] (17.21) 
WE = (xt — yp 3 — yz) — 4xrxiyix2y2, 
(17.22) 
ye =2 [i — ye) xoyo + x1y1 (x3 — yD | ; 
Finding these could be challenging; verifying them is less so. |_| 


For continuous groups, it is usually simpler to decompose direct-product representations 
in other ways. For example, in Chapter 16 we used ladder operators to identify overall 





5An actual problem will have a wave function that, in addition to the functional dependence shown here, will have a completely 
symmetric additional factor that is not relevant for the present group-theoretic discussion. 
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Table 17.4 Character Table, Group C4, 








I 2C4 C 20 y 2o4 
Ay 1 1 1 1 1 
Ap 1 1 1 -1 = 
By 1 - 1 1 = 
By 1 = 1 -1 
E 2 2 0 














angular-momentum states (irreducible representations) formed from products of individual 
angular momenta. The resulting multiplets correspond to the irreducible representations, 
and the angular momentum functions that we found are their bases. 


Exercises 


17.5.1 


17.6 


The group C4y has eight elements, corresponding to the rotational and reflection sym- 
metries of a square that cannot be turned over. See Fig. 17.6. Symmetry rotations about 
the z axis are denoted C4, C2, C4. Reflections relative to the xz and yz planes are 
named 0, and o/,; those at 45° relative to the xz and yz planes are called og and o/, (d 
indicates “dihedral’’). The character table for C4, is in Table 17.4. 


(a) Find the direct sum of irreducible representations of C4, corresponding to the 
direct product E ® E. 


(b) A basis for E (in the context of Fig. 17.5) consists of the two functions g; = x, 
(2 = y. Apply a few of the group operations to this basis and verify the entries for 
E in the character table. 


(c) Assume now that we have two sets of variables, x1, yj and x2, y2, and we form 
the direct-product basis «1x2, x12, 1X2, y1y2. Determine how the direct-product 
basis functions can be combined to form bases for each of the irreducible repre- 
sentations in the direct sum corresponding to E @ E. 


SYMMETRIC GROUP 


The symmetric group S, is the group of permutations of n distinguishable objects, and 
is therefore of order n!. To see this, note that to make a permutation, we may choose the 
first object in n different ways, then the second in n — | ways, etc., until we reach the nth 
object, which can be chosen in only one way. The total number of possible permutations is 
therefore n(n — 1)...(1) =n!. This group is important in the physics of identical-particle 
systems, whose wave functions must be either symmetric with respect to particle inter- 
changes (particles with this symmetry are called bosons), or antisymmetric under pairwise 
particle interchanges (these particles are called fermions). This means that an n-boson 
wave function Wg (1, 2,...,) must satisfy 


PWa(l,...,n) =Wp(1,...,n), (17.23) 
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where P is any permutation of the particle numbers. From a group-theoretical viewpoint 
this means that Vz is a sole basis function for the trivial A, representation of S,: 1 x 1, with 
all members of the representation equal to (1). Many-fermion wave functions Vr (1,...,7) 
satisfy 


PWr(l,...,n)=epWp(l,...,n), (17.24) 


where €p is the n-particle Levi-Civita symbol with an index string corresponding to P; in 
simple language this means €p = | if P is an even permutation of the particle numbers 
(one requiring an even number of pairwise interchanges), and ep = —1 if P is odd. This 
means that W- is the sole basis function for the 1 x | totally antisymmetric representation 
of S,, with members (€p), which we will call Az. 

Since the representations needed for either bosons or fermions are simple and of 
dimension | x 1, it might seem that sophisticated group-theoretic considerations would be 
unnecessary. But that is an oversimplification, because many-fermion systems (and some 
boson systems) consist of direct products of spatial and spin functions, and the spin func- 
tions may form a basis of S,, of dimension larger than one. 


Example 17.6.1 = TWoOAND THREE IDENTICAL FERMIONS 


In elementary quantum mechanics, the ground state of a two-fermion system such as the 
two electrons of the He atom can be treated using a simple wave function of the form 


Wr = (a2) + 8) FQ) (a1) — BA)a@)). 


Here f and g are single-particle spatial functions, and aw, 6 describe single-particle spin 
states. We continue, using a streamlined notation in which the particle numbers are 
suppressed, understanding that they always occur in ascending numerical order, so Vr 
will henceforth be written (fg + gf)(aB — Ba). It is obvious that Wr has the fermion 
(anti)symmetry; we note that it is an A basis function, which is the product of a symmet- 
ric A, spatial function and an antisymmetric A2 spin function. The physics of this problem 
demands that the overall ground-state wave function Yr contain spin function af — Ba 
because it is a two-particle spin eigenstate. The two-particle example shows that the A2 
overall representation was obtained as Aj ® Ap. 

For three particles, things are different. To treat the ground state of the Li atom, we 
cannot form a completely antisymmetric spin function using only the two single-particle 
spin functions a and f. The actual spin functions relevant for the ground state form a 2 x 2 
representation of S,, which we will call E: 


1 1 
= QaapB —aBpa— Ban), 6 = —~(fpaa—apa). 17.25 
dé 3 B B ( ) 
Since permutations mix 6; and 6, the overall wave function for this three-particle system 
must be of the form 





Wr = x101 + x262, 
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where x; and x2 are three-body spatial functions such that Wp has the required Az sym- 
metry. If the x; are built from spatial orbitals f, g, and h, one possible set of x; are 


1 





n= 5 (ghf —hfg —hgf + fhg), 

(17.26) 
ee h h an, LaF ae | eh 
Xa= Fe | Sah + sf 58 ntfs — shaft — 5 fhe |, 


a result that it is difficult to find by trial and error. Since the spin functions, and therefore 
also the spatial functions, become more complicated as the system size increases, the value 
of a group-theoretic description clearly becomes more urgent. | 


We consider now, from a formal viewpoint, only the many-fermion case. As illustrated 
in Example 17.6.1, we deal with space-spin functions in which the spin function has, for 
reasons we will not discuss here, been chosen to be built from an irreducible representation 
K of the symmetric group, whose member for permutation P is a unitary matrix designated 
U*(P), and whose basis is a set of spin functions 6;, i = 1,...,nxK, where nx 1s the 
dimension of the spin representation. This means that 


nk 
PO; => UR(P)9;. (17.27) 
j=l 


We shall now show that an antisymmetric overall space-spin function can result if we form 
nk 
We=>0 Xi 6;, (17.28) 
i=l 
where the x; are basis functions for a representation K’, of the same dimension as K, 
meaning that 
nxK 
Pig= > UL xe (17.29) 
k=1 
The representation K’ is assumed to have members that satisfy 
U*'(P) = epUX(P)*. (17.30) 
The representation K’ must exist, since it is (apart from a complex conjugate) the direct 
product of representations K and Az. Because A? only imparts sign changes to various UF, 


the representation K’ will be irreducible because representation K is. The representation 
K’ is termed dual to representation K. 
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To verify that the assumed form of Wp has the required Az symmetry, we take it, as 
given in Eq. (17.28), and apply to it an arbitrary permutation P: 


PWp =) (Px) (PA)=>~ (x ufera) So uK(P)6; 
n j 


i=1 i 


=>) (x us@ufier) xe 9; 
ik \i 
= (x crugierufir) XK 9}. (17.31) 


Jk 
The steps taken in the processing of Eq. (17.31) are substitutions of Eqs. (17.27) and 
(17.29) for Px; and P6;, followed by a conversion from U* to U* through the use 


of Eq. (17.30). We complete our analysis by recognizing that because U is unitary, 
Uxi(P)* = (U~!)ix(P), so 


YepUR(P)*UN(P) = €pdjx, 


l 


leading to the final result 


PWr =) ep djnxe 9) =€P > Xe Oe = EP. (17.32) 
jk k 


Equation (17.32) shows that the overall wave function Wr has the required fermion anti- 
symmetry. 

Our only remaining problem is to construct spatial functions x;, which are bases for rep- 
resentation K’. We state without proof (see Additional Readings) that this can be accom- 
plished using the formula 


=> UF PY P xe: (17.33) 
P 


where xo is a single spatial function whose permutations will be used to construct the x;. 
The index j identifies an entire set of x;; if xo has no permutational symmetry, we can 
create sets of x; innx- in different ways, each corresponding to a different value of j. 


Example 17.6.2 CONSTRUCTION OF MANY-BoDy SPATIAL FUNCTIONS 


We consider a three-electron problem in which the spin states are given by Eq. (17.25). We 
need the representation of 83 for which these 6; are a basis. We are fortunate to already 
have this representation, as S3 is isomorphic (in 1-1 correspondence) with D3, so we can 
use the set of 2 x 2 representation matrices given in Eq. (17.2), if we make the identification 
Cz <> P(12), Ch) < P(13), CY < P(23), where P(ij) denotes the permutation that inter- 
changes the ith and jth items in the ordered list to which the permutation is applied. The 
permutation P(123 — 312) corresponds to C3, and P(123 — 231) corresponds to Cc. 





844 Chapter 17 Group Theory 


We now apply Eq. (17.33); an easy way to do this is to start by generating the matrix T 
that results from keeping all i and j. In the present case, that means forming the matrix sum 


T =U(Z) x0 — U(C2) P(12) x0 — U(C4) P13) xo — U(CZ) P(23) xo 
+ U(C3)P(123 > 312) x9 + U(C3) P(123 — 231)xo. 
The minus signs for the U(C2) terms arise from the «p which is needed to convert U* 
into UK’, 
Taking xo as the product f(1)g(2)h(3), hereafter written fgh, and inserting numerical 
values for the U, we reach 
fgh—gfh—3x(ghf 3V3(ghf —hfg 
+hfg—hgf — fhg) —hgf + fhg) 
TH hes gf — fhg sf he (17.34) 
sV3(-ghf thfg  fgh+efh—3(ghf 
—hgf + fhg) +hfg +hgf + fhg) 


Each column of Eq. (17.34) defines a set of x;, in a form that is not guaranteed to be 
normalized. From the second column, dividing through by /3 for normalization, we obtain 
the x; that were listed as a possible wave function in Example 17.6.1 at Eq. (17.26). The 
first column of Eq. (17.34) shows that there is a second possibility for an antisymmetric 
wave function built from the spatial product fgh, namely one that can be written 


We = X101 + x402, 


with the normalized spatial functions 








1 1 1 1 1 
x= (eh g fh — 5 ghf shie + shel + 5fhe ). 


1 
x2 = 5 (—8hf +hfg —hef + fhg). 


| 
Exercises 
17.6.1. (a) The objects (abcd) are permuted to (dacb). Write out a 4 x 4 matrix representa- 
tion of this one permutation. 
Hint: Compare with Exercise 17.4.2. 
(b) Is the permutation (abdc) — (dacb) odd or even? 
(c) Is this permutation a possible member of the D4 group, which was the subject of 
Exercise 17.4.3? Why or why not? 
17.6.2 (a) The permutation group of four objects, S4, has 4! = 24 elements. Treating the 


four elements of the cyclic group, C4, as permutations, set up a 4 x 4 matrix 
representation of C4. Note that C4 is a subgroup of P4. 


(b) How do you know that this 4 x 4 matrix representation of C4 must be reducible? 





17.6.3 


17.7 
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The permutation group of four objects, $4, has five classes. 


(a) Determine the number of elements in each class of S4 and identify one element of 
each class as a product of cycles. 


(b) Two of the irreducible representations of S4 are of dimension | (and are usually 
denoted A; and Az). Noting that permutations can be classified as even or odd, 
find the characters of A; and Ap. 


Hfint. Set up a character table and fill in the A; and Az lines. 


(c) One irreducible representation of S4 (usually denoted F) is of dimension 2. Deter- 
mine the dimensions of all the irreducible representations of $4 other than A;, A2, 
and E. 


(d) Complete the character table of Sq. 
Hint. Only the even permutations have nonzero characters in the E representation. 


CONTINUOUS GROUPS 


Several continuous groups whose importance in physics was recognized long ago corre- 
spond to rotational symmetry in two- or three-dimensional space. Here the group elements 
are the rotations, the angles of which can vary continuously and thereby assume an infinite 
number of values. For rotations, the group multiplication rule corresponds to the applica- 
tion of successive rotations, which we have seen can be described by matrix multiplication. 
Rotations clearly form a group since they contain an identity element (no rotation), suc- 
cessive rotations are equivalent to a single rotation, and every rotation has an inverse (its 
reverse). 

Rotations in two-dimensional (2-D) space can be described by 2 x 2 orthogonal matrices 
with determinant +1; the group consisting of these rotations is named SO(2) (SO stands 
for “special orthogonal”). If we also include reflections, so that the determinant can be 
+1, the group is named O(2). Since a 2-D rotation is completely specified by a single 
angle, SO(2) is a one-parameter group. A matrix representation of SO(2) was introduced 
in Eq. (17.1); the group parameter is the rotation angle ¢. 

Rotations in 3-D space are described by 3 x 3 orthogonal matrices. The resulting groups 
are designated O(3) and SO(3); for SO(3), three angles (e.g., the Euler angles) are group 
parameters. Generalizing to n x n matrices, the groups are named O(n) and SO(n); the 
number of parameters needed to specify fully ann x n real orthogonal matrix is n(n — 1)/2, 
and that is the number of independent parameters (generalizations of the Euler angles) 
needed in SO(n). If we further generalize to unitary matrices, we have the groups SU(n) 
and U(n). Proof that these sets of unitary matrices form groups is left as an exercise. 

Let’s introduce some nomenclature. The n x n matrices referred to above can be thought 
of as the defining, or fundamental, representations of the groups involved. The order of 
a continuous group is defined as the number of independent parameters needed to specify 
its fundamental representation, so the order of SO(n) is the previously stated n(n — 1)/2; 
the order of the group SU(n) is n* — 1. 

In addition to their use for the treatment of rotational symmetry, continuous groups 
are also relevant to the classification of elementary (and not so elementary) particles. It has 
been experimentally observed that regularities in the masses and charges of sets of particles 
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can be explained if their wave functions are identified as basis members of an irreducible 
representation of an appropriate group. Note that now the group does not describe rotations 
in ordinary space, but refers to a more abstract space relevant to an understanding of the 
physics involved. The earliest example of this idea was electron spin; spin wave functions 
are objects in an abstract SU(2) space, together with rules to unravel their observational 
properties. A further abstraction began with the notion that the proton and neutron might 
form a basis for an abstract SU(2) representation, and has since blossomed with the intro- 
duction of SU(3) and other continuous groups into particle physics. A brief survey of these 
ideas is presented in our specific discussion of SU(3). 


Lie Groups and Their Generators 


It is extremely useful to manage groups such as SO(n) or SU(n) in ways that do not 
explicitly involve an infinite number of elements; a formalism for doing so was devised 
by the Norwegian mathematician Sophus Lie. Groups for which Lie’s analysis is appli- 
cable, called Lie groups, have elements that depend continuously on parameters that vary 
over closed intervals (meaning that the parameter set includes the limit of any converging 
sequence of parameters). The groups SO(n) and SU(7) are Lie groups. 

Lie’s essential idea was to describe a group in terms of its generators, a minimal set 
of quantities that could be used in a specific way (multiplied by parameters) to produce 
any element of the group. Our starting point is, for each parameter ¢ controlling a group 
operation, to introduce a generator S with the property that when ¢ is infinitesimal (and 
therefore written 5g) the group element with parameter 5g (which must be close to the 
identity element of the group) can be represented by 


U(sy) =1+ i898. (17.35) 


The factor i in Eq. (17.35) could have been included in S but it is more convenient not to 
do so. Group operations corresponding to larger values of gy can now be generated from 
repeated operation (N times) by y/N, where g/N is small. We therefore identify U(g) as 
the limit 





. N 
— igSs\". 
uo tim (M4 >) 3 


This large-N limit defines the exponential, so we have the general result 
U(~) =exp(igS). (17.36) 


Given any representation U of our continuous group, we can find the generator S cor- 
responding to the parameter y for that representation by differentiation of Eq. (17.36), 
evaluated at the identity element of our group. In particular, 


dU 
ie “| -s, (17.37) 
dp g=0 
revealing that the entire behavior of a representation U can be deduced from its behavior 
in an infinitesimal parameter-space neighborhood of the identity operation. However, to 
obtain complete knowledge of the structure of a Lie group we need to study the behavior 
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of its generators for a representation that is faithful; for that purpose it is desirable to use 
the fundamental representation. 


Example 17.7.1. — SO(2) GENERATOR 


SO(2) involves rotational symmetry about a single axis, and its operations are counter- 
clockwise rotations of the coordinate axes through angles y. Working with the 2 x 2 
fundamental representation of SO(2), an infinitesimal rotation dg causes (to first order) 
x’, y)=(x + ydg, y —x 89), or 


EI-(, S)C-LUE owl’) tome 


with 
iS 0 ? or S 0 02, ( .38) 


where o2 is a Pauli matrix. A general rotation is then represented by Eq. (17.36) as 


cos ) (17.39) 


Uy) = eS = In cosy + io2 sing = ( 
—sing cos@ 
where we have evaluated the exponential of the matrix in Eq. (17.39) using the Euler 
identity, Eq. (2.80). This equation can be recognized as the transformation law for a 2-D 
coordinate rotation, Eq. (3.23), verifying that the generator formalism works as expected. 
If we had started from the final expression for U(g) given in Eq. (17.39), we could have 
generated S from it by applying the differentiation formula, Eq. (17.37). a 


The generator form, Eq. (17.36), has some nice features: 


1. For the groups SO(n) and SU(n), any U will be unitary (remember, “orthogonal” is 
a special case of “unitary’’). This means that 


Ul= exp(—igS) = Ul= exp(—igS'), (17.40) 


so S = S', showing that S is Hermitian. That is the proximate reason for inclusion 
of i in the defining equation for S. 

2. Because for both SO(n) and SU(n), det(U) = 1, we also have, invoking the trace 
formula, Eq. (2.84), 


det(U) = exp (trace(In U)) = exp ( ig trace(S)) = (17.41) 


This condition is satisfied for general g only if trace(S) = 0. So S is not only Hermi- 
tian, but traceless. 

3. It can be shown (but is not proved here) that the number of independent generators of 
a Lie group is equal to the order of the group. 
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One of Lie’s key observations was that by focusing on infinitesimal group elements, 
various properties of the generators could be deduced. We have already seen that if the 
form of U in terms of its parameters is known, the generators S can be obtained by differ- 
entiation of Eq. (17.36) in the limit corresponding to the identity group element. 

Second, relations between the generators can be developed, as follows: Let us consider 
two operations U;(e;) and U; (ex) of a group G, that respectively correspond to the gen- 
erators S; and S,. The values of €; and €, are assumed small, so the resulting U; and 
Ux differ, but only slightly, from the identity element. Expanding the seponentals and 
keeping terms through second order in €, 


1 
U; = exp(ie; Sj) =1+ ie; Sj 2 fees, 


Ux = exp(ieg Sg) = 1 + ieg Sy — x 7S; +. 


we evaluate the leading term (in €) of the matrix product U, US U;,U;. The linear terms 
all cancel, as do several of the quadratic terms. The remaining quadratic terms can be 
grouped so as to reach the result 


U,'U5"U,U; = 1+ €jex[Sj, Sx] + 


=1+iejex )> fjSi +++. (17.42) 
I 


The last line of Eq. (17.42) reflects the fact that the left-hand side of the equation must 
correspond to some group element, and that element must, to first order in the generators, 
be of the form shown. Note that the premultipliers ie ;€, are not a form restriction, as their 
presence simply changes the value of fx. 

Comparing the two lines of Eq. (17.42), we obtain the important closure relation among 
the generators of the group G: 


[Sj, Sk] =i > fiarS. (17.43) 
1 


The coefficients fj, are called the structure constants of G. It can be shown that fjx7 is 
antisymmetric with respect to index permutations, so fjx1 = frij = fijk =— Skil =—Sikj = 
— fj1xj. The structure constants provide a representation-independent characterization of a 
Lie group, but as already mentioned, to determine them we will need to work with a faithful 
representation, such as the group’s fundamental representation. We will shortly do so for 
the groups we study in detail. 

As is obvious from the foregoing analysis, Lie group generators will not in general com- 
mute. In 3-D, rotations about different axes do not commute, and therefore their generators 
cannot commute either. An additional indicator for group classification is the maximum 
number of independent generators that all mutually commute. This number is called the 
rank of the group; it is significant because the generators can be subjected to unitary trans- 
formations without changing the ultimate group structure, and the mutually commuting 
generators can therefore be brought simultaneously to diagonal form. Once this is done, 
the basis members of the generator set can be labeled using the diagonal elements (the 
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eigenvalues) of the commuting generators. The values of the labels (and the physical phe- 
nomena related thereto) depend on the representation in use. 

For the orthogonal groups SO(n) and unitary groups SU(7) the commutation relations, 
Eq. (17.43), can be developed along the lines of angular momentum, leading to generalized 
ladder operators (and selection rules) in conjunction with the mutually commuting opera- 
tors. For these central aspects of (the so-called classical) Lie groups we refer to the work 
by Greiner and Mueller (see Additional Readings). 

Summarizing, the rank of a group indicates the number of indices needed to label the 
basis. In applications to quantum mechanics, these indices are often referred to as quantum 
numbers. For example, in SO(3), which is of rank 1, the index is usually taken to be M;, 
usually identified physically as the z component of an angular momentum; when SU(2), 
also of rank 1, is used for the description of electron spin, the index is usually called Ms. 
The possible values of Mz or Ms depend on the representation, and we saw in Chapter 16 
that the values range, in unit steps, between +L and —L (or +S and —S), so that diagrams 
identifying these basis members can be plotted on a line. In contrast, we will see that 
SU(3) is of rank 2, so its basis members are labeled with two quantum numbers. Diagrams 
identifying the label assignments will in that case need to be 2-D. 

It is also possible to label entire representations. One way to label them is to use the 
eigenvalues of operators that commute with all the generators of the group; such operators 
are called Casimir operators; the number of independent Casimir operators is equal to 
the rank of the group. SO(3) has therefore one Casimir operator; it is the operator usually 
known in angular-momentum applications as L* or J. 


Groups SO(2) and SO(3) 


SO(2) and SO(3) are rotation groups; SO(2) corresponds to rotational symmetry about 
one axis, which we will take to be the z axis when the symmetry is for a 3-D system. SO(2) 
will therefore have only one generator, that already found in Eq. (17.38): 


0 -i 
S,=02= (; ) (17.44) 
i O 


To use S, as one of the generators of SO(3), we extend to a 3 x 3 basis, calling the 
generator S3, obtaining 


0 = 6 
Sait © 01. (17.45) 
0 0 0 


SO(3) has two other generators, S; and Sz. To obtain S), the generator corresponding to 


1 0 0 
U.(W)=|]0 cosy sinw |, (17.46) 
0 -sinw cosy 
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we apply Eq. (17.37): 





0 0 00 O 
s,=-i || =-i| 0 -siny cosy =]|0 0 -i]. (17.47) 
ae 0 -—cosy —sinw 0 i O 


In a similar fashion, starting from 
cos@é 0 —siné 
U,(@) = 0 1 0 F (17.48) 


sind QO cosdé 


we find 
0 0 i 
So=] 0 O Of. (17.49) 
-i 0 0 


Summarizing, the structure of SO(2) is trivial, as it has only a single generator, and has 
order | and rank 1. However, the structure of SO(3) is not entirely trivial. Because no two 
of S;, S2, and S3 commute, SO(3) will have order 3, but rank 1. By matrix multiplication, 
we may compute its structure constants. It is easily verified that 


[S;, Sk] =iejuSi, (17.50) 


where € jx; is a Levi-Civita symbol. Thus, the Levi-Civita symbols are the structure con- 
stants for SO(3). Note also that the S; obey the angular momentum commutation rules. 
In fact, these are the same matrices that were called K; in Eq. (16.86) in Chapter 16, and 
they were identified there as matrices describing the components of angular momentum in 
a basis consisting of x, y, and z. This observation can be generalized to reach the conclu- 
sion that for any representation of SO(3), the generators can be taken to be the angular 
momentum components L ; (j = 1, 2,3) as expressed in any basis for that representation. 


Example 17.7.2 GENERATORS DEPEND ON BASIS 


To show that the generators indeed have a form that depends on the choice of basis, con- 
sider a basis for SO(3) proportional to the spherical harmonics for / = 1 with standard 
phases, 


1 1 
—— (x+iy), =, = — (x -iy). (17.51 
a y), we Ws Ta ) 
We now apply L, = —i[yd/dz — zd/dy] to the basis members, getting the result Ly wy = 
2/V2= n/V2, Lita = iy = (Wit s)/V2, Les = 2/V2= v2/V2, meaning that 


the matrix representation of L,, and therefore of a generator we will call S,, is 


w= 


, (0 1 0 
S211 wo 4), (17.52) 
V2\o 1 0 
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Applying L, and L, to the spherical harmonic basis, we obtain generators S, and S,: 


i O0-i O 10 0 
S,;=—~=]i O-i], S,=]0 0 Of}. 17.53 
0 i O 0 0-1 


These generators, though different from those given in Eqs. (17.45), (17.47), and (17.49), 
are equivalent to them in the sense that they define the same irreducible representation 
of SO3. | 


Group SU(2) and SU(2)-SO(3) Homomorphism 


A complete set of generators for the fundamental representation of SU(2) must span the 
space of traceless 2 x 2 Hermitian matrices; since there is only one off-diagonal element 
above the diagonal that can have an arbitrary complex value, it can, if nonzero, be assigned 
in two linearly independent ways (such as 1 and —i). The below-diagonal element is then 
completely determined by Hermiticity. There is only one independent way to assign the 
diagonal elements, as there are two and they must be real and sum to zero. Thus, a simple 
set of matrices satisfying the necessary conditions consists of the three Pauli matrices o ;, 
j = 1,2,3. Noting also that there would be advantages to having the generators scaled 
so that they would satisfy the angular momentum commutation relations, we choose the 
definition 


1 
Sj= 50), j=1,23. (17.54) 


Then, based on our many previous encounters or by performing the matrix multiplications, 
we can confirm 


(Sj, Sx] = i€jSp. (17.55) 


In addition, for rotation parameters denoted as a; in connection with generators S;, we 
have, calling the corresponding SU(2) members U, 


Uj (aj) = exp(ia;o ;/2), J= 1,2, 3. (17.56) 
Invoking the Euler identity, Eq. (2.80), we can rewrite Eq. (17.56) as 


Uj(@j)= 1 00s (=) + io; sin (=). (17.57) 


The group SU(2) was first recognized as relevant for physics when it was observed 
that spin states of the electron form a basis for its fundamental representation. We already 
know, from Chapter 16, that orbital angular momentum multiplets come in sets with odd 
numbers of members (2L + 1, with L integral). But we also observed that abstract quanti- 
ties that obey the angular momentum commutation rules with half-integer L values come 
in multiplets with even numbers of members. The multiplet with two members is the fun- 
damental basis for the group SU(2). These basis functions are conventionally written |) 
and ||), (or just a and #6), and in matrix notation are 


It) = (a) IW) = (‘). (17.58) 
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Since the structure constants for SU(2) show that its generators satisfy the angular momen- 
tum commutation rules, we may conclude that all angular momentum multiplets define 
representations of SU(2); in Chapter 16 we found that the multiplets of odd dimension 
(2L + 1 with L integral) can be chosen to be the spherical harmonics of angular momen- 
tum L and are therefore also a basis for a representation of SO(3). Angular momentum 
multiplets of even dimension do not have a 3-D spatial representation and cannot corre- 
spond to a representation of SO(3). They are the more abstract quantities we call spinors, 
have half-integer angular-momentum quantum numbers, and are bases only for represen- 
tations of SU(2). 

Further understanding of the situation can be obtained by applying U,.(g), a synonym 
for U;(¢), to the spin function |+). Taking y = 7, this corresponds to a 180° rotation about 
the x axis, which we might expect would convert |) into ||). Applying Eq. (17.57), which 
for the current case assumes the form U, = ia ,, we have 


U, It) =i (° 0) 6, = (°) eee (17.59) 


So far, so good. But let’s now try a similar rotation with g = 27. We then have U, = —12, 
meaning that a complete 360° rotation does not restore |), but gives instead — |), namely 
the expected state, but with a change of sign. To recover |) with its original (+) sign would 
require a rotation y = 477, i.e., two revolutions. Each rotation between g = 27 and gy = 4 
is, with opposite sign, equivalent to one in the (0, 27) range. 

We now see the essential difference between SU(2) and SO(3): The angular range of 
the rotation parameters in SU(2) is twice that in SO(3), so each SO(3) element is gen- 
erated twice in each dimension (with different signs) in SU(2). Thus the correspondence 
between the two groups is not one-to-one (an isomorphism), but is two-to-one, a homo- 
morphism. The existence of this homomorphism is not important for irreducible represen- 
tations of odd dimension (corresponding to integer L or, in more general contexts, J), since 
then U(2zr) = U(0) and the range (27, 47) simply duplicates (0, 277). But the homomor- 
phism remains important for even-dimension representations of SU(2), which correspond 
to half-integer J and are not representations of SO(3). However, the fact that all rep- 
resentations of SO(3) are also representations of SU(2) means that we can form within 
SU(2) direct products that include representations of both even and odd dimension. This 
observation validates our analysis of states with both orbital and spin angular momentum. 

In summary, we observe that half-integer angular momentum basis functions, which 
in earlier discussion we have already labeled as spinors, not only are objects that cannot 
be represented as functions in ordinary 3-D space, but are also objects whose rotational 
properties are unusual in that their angular periodicity is 47, not the value 27 that would 
ordinarily be expected. They are thus somewhat abstract quantities whose relevance to 
physics rests on their ability to explain the “spin” properties of electrons and other fermions. 


Group SU(3) 


Starting in the 1930s, physicists began to give considerable attention to the symmetries of 
baryons, particles that, as the prefix “bary” implies, are heavy in comparison to electrons, 
and that interact subject to a force called the strong interaction. The earliest conjecture, 
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by Heisenberg, was to the effect that the approximate charge independence of the nuclear 
forces involving protons and neutrons suggested that they could be viewed as different 
quantum states of the same particle (called the nucleon), with the nucleon having a sym- 
metry appropriate to the existence of a doublet of states. The nucleon was postulated to 
have the same symmetry as electron spin, namely that of the continuous group SU(2). 
Although the nucleon symmetry has nothing to do with spin, it is referred to as isospin, 
with the isospin symmetry described by the matrices t;, i = 1,2, 3 (equal to the corre- 
sponding Pauli spin matrices 0; ), and the isospin states can be classified by the eigenvalue 
of 73 (designated /3), with /3 = +1/2 corresponding to the proton, /3 = —1/2 correspond- 
ing to the neutron. 

By the early 1960s, a large number of additional baryons with strong interactions had 
been identified, of which eight (proton, neutron, and six others) were rather similar in mass. 
The masses of the baryons discussed in this section are listed in Table 17.5. 

In 1961, Gell-Mann, and independently Ne’eman, suggested that these eight baryons 
might be symmetry-related, and proposed that they be identified with an irreducible rep- 
resentation of the group SU(3), with the relatively small mass differences attributed to 
forces weaker than the strong interaction and with different symmetry. The states describ- 
ing these eight particles would be a basis for the generators of an SU3 representation of 
dimension 8. Subsequently, it was proposed that all eight of these particles were actually 
formed from combinations of three smaller, and presumably more fundamental, particles 
called quarks, and the three types of quarks initially postulated, given the names up (u), 
down (d), and strange (s), were ultimately identified as forming a basis for the generators 
of SU(3). This original insight then led to the identification of a set of mesons involved 
with strong interaction as species consisting of one quark and one antiquark, thereby also 
corresponding to basis members of representations of SU(3). 

The situation described in the preceding paragraph can be more fully understood by pro- 
ceeding to a somewhat detailed discussion of the group SU(3). This group is defined by its 
generators, of which there are eight. The maximum number that commute with each other 
is two, so the group is of order 3* — 1 = 8 and rank 2. The simplest useful way to specify the 


Table 17.5 Baryon Octet 











Mass Y vE 
z a7 1321.32 -l 0-3 
OM 1314.9 -1 +4 
B: x 1197.43 0 -1 
uP 1192.55 0 0 
xt 1189.37 0 +1 
A: A 1115.63 0 0 
n 939.566 1 -5 
p 938.272 1 +4 
Masses are given as rest-mass energies, in MeV (1 MeV = 


10° eV). 
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generators is to write them as 3 x 3 matrices in the SU(3) fundamental representation. Like 
other continuous groups, SU(3) has an infinite number of other irreducible representations 
of various sizes, but the key properties of the generators (specifically, their commutation 
tules) will be the same as those of the fundamental representation. We accordingly write 
the eight SU(3) generators in terms of zero-trace Hermitian matrices 4; through A, with 


1 
Si = 5 hie (17.60) 
where the 4;, known as the Gell-Mann matrices, are 
0 1 0 0 -i O 
MM — 1 0 0 > 2 = i 0 0 3 
0 0 0 0 O O 
1 0 O 0 01 
0 O 1 0 0 
(17.61) 
0 O -i 0 0 
As=10 O Of, As=]O O IY, 
i 0O O 0 1 
0 0 O 1 1 O 0 
47=!10 0 -i]}], Ag=—]O 1 0 
Oa ¥3\0 0 -2 


In our use of SU(3), we will associate the rows and columns of this representation (in 
order) to the quarks u, d, and s. Note that 4;, 42, and 43 are block diagonal with the upper 
block being the SU(2) isospin matrices, signaling the presence of an SU(2) subgroup with 
generators A, /2, A2/2, and 43/2. If we combine 13 and Ag so as to choose the generators 
in different ways, we can replace 43 with one of the following: 


0 0 O 
My=VBAg—a3=10 1 Of, (17.62) 
Oe 0 4 
10 0 
Ma=V3Agt+i3=[0 0 OF, (17.63) 
0 0 -1 


indicating the existence of another SU(2) subgroup with generators S) = 6/2, S, = 
47/2, S, = 24/2, and a third SU(2) subgroup, with generators SY = A4/2, S3 = A5/2, 
S35 = 45/2. These observations support the notion that isospin multiplets can exist within 
an SU(3) basis. 

Because SU(3) is of rank 2, the members of its representations can be labeled according 
to the eigenvalues of two commuting generators, in contrast to the single label, S, or J;, 
that we employed to label SU(2) members. It is customary to use for this purpose the two 
generators (13 and Ag) already in diagonal form. Continuing with the notation introduced 
for the nucleon, the eigenvalue of the SU(3) generator S3 is identified as 13, while Sg is 
used to construct the identifier Y (known as hypercharge), defined as the eigenvalue of 
2S/4/3. An oft-used alternative to Y is the strangeness S = Y — 1. 
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Example 17.7.3 QUANTUM NUMBERS OF QUARKS 


From 


1 1 0 
0 0 


I 10 O 
283/V3=s/V3 = 5 01 Of, 
0 0 -2 
we find the Y values ; for u and d, and —4 for s. a 


From the definitions of the S; in Eq. (17.60), one can readily carry out the matrix opera- 
tions needed to establish their commutation rules. Note that even though the commutation 
rules will be obtained by examining the specific representation introduced in Eq. (17.60), 
they apply to all representations of the SU(3) generators. 

We will use the commutation rules in a ladder-operator approach to the analysis of the 
symmetry properties of the three-quark multiplets. It is helpful to systematize the work by 
temporarily renaming $1, S2 as |), In; Se, S7 as U;, U2; and Sy, S5 as V;, V2. Then we 
introduce 


l=), +ih, I. =], —ih, 


U. =U, +iU2, U_ =U; —ilp, (17.64) 





Vi=V,;+iVo, Vi=V,—-iVo, 


and write some relevant commutators as 



































1 1 
[S3, I-J=Hly,, [$3 Uz] =, Us; [S3, V+] = 

; (17.65) 
[Sg, 1+] = 0, [Ss, Us] =+5V3 Us, [Ss, Vs] = +5-V3 Vs. 


Using the logic of ladder operators (described in detail for applications to angular momen- 
tum operators in Section 16.1), the above commutators can be used to show that, starting 
from a basis function y(J3, Y), we can apply I+, U+, or V+ to obtain basis functions with 
other label sets. For example, 














1 
[Sg, U+]w Us, Y) = SgU4 Ws, Y) — U4Ssw3, Y) = 5v3 Uw, Y). 
Replacing Sgw(/3, Y) by 5Vv3 Yw(, Y), this equation can be rearranged to 
1 
Ss(Us ws. ¥)) = 5V3(¥ + D(Ur¥Us.¥)), 
which shows that if it does not vanish, U;w(3, Y) is an eigenvector of Sg with an 


eigenvalue corresponding to an increase of one unit in Y. Similarly, from the relation 
[S3, U4]w3, Y) = —5U, Wb, Y), we find that UiW(/3, Y), if nonvanishing, is an 
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eigenvector of S3 with an eigenvalue less by 1/2 than that of (J3, Y). These observa- 
tions correspond to the equation U+w (3, Y) = CwU3 — 5 Y + 1). This and other ladder 
identities are summarized in the following equations: 




















+¥(B,Y)=Cr WU3 +1,Y), 
Usv(3, Y) =Cu WU3 #5.¥£0), (17.66) 
1 
Vaw(3,Y)=Cy ws rel 1). 


The constants C will depend on the representation under study and on the values of /3 and 
Y; if the result of an operation according to any of these equations leads to an (J3, Y) set 
that is not part of the representation’s basis, the C associated with that equation will vanish 
and the ladder construction will terminate. 

It is important to stress that the operators in Eq. (17.66) only move within the represen- 
tation under study, so if we start with a basis member of an irreducible representation, all 
the functions we will be able to reach will also be members of the same representation. 


Example 17.7.4 — Quark LADDERS 


As a preliminary to our study of baryon and meson symmetries, let’s see how the ladder 
operators work, with the quarks, symbolically w(/3, Y), represented by 


=0(h2)=(0), a-v(-S2)-(1), s=¥(o-2)-(0 
ENG SE Ng fe 3h eR ae | 


As explained in Example 17.7.3, the values of /3 and Y are obtained from the diagonal ele- 
ments (the eigenvalues) of S3 and Sg. The 3 x 3 matrices representing the ladder operators 
in this example are 


010 0 0 1 00 0 

j=10 0 0}, U,=]o oo], Visto o 1}, 

00 0 00 0 00 0 
(17.67) 

00 0 00 0 00 0 

La|1 0 0 ULe]e 0 0}, Velo 0 0 

00 0 10 0 0 1 0 


By straightforward matrix multiplication, we find |_u = d, l4d =u, U_d=s, Uys =d, 
V_u=s, Vis =u; all other operations yield vanishing results. These relationships can 
be represented in the 2-D graph shown as Fig. 17.7 with Y in the vertical direction and 1; 
horizontal. The arrows in the graph are labeled to indicate the results of application of the 
ladder operators. | 


Continuing now to the baryons, we consider representations appropriate to three quarks, 
which we can form as the direct product of three single-quark representations. Using the 
notation 3 as shorthand for the fundamental representation (which is of dimension 3), the 
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FiGURE 17.7 Conversions between u, d, and s quarks by application of ladder operators 
Ix, Us, and Vi. The coordinates of each particle are its (/, Y). 




















FIGURE 17.8 Root diagram of SU(3). Each operator is labeled by the changes 
it causes: (AJ, AY). 


direct product we need is 3 ®3 @3. This direct product is a reducible representation, which 
decomposes into the direct sum 


3@3@3-1008O801, (17.68) 


where 10, 8, and 1 refer to irreducible representations of the indicated dimensions. 

A standard way to decompose product representations such as we have here uses dia- 
grams known as Young tableaux. Because development of the rules for construction and 
use of Young tableaux would take us beyond the scope of this text, we pursue here an alter- 
nate route that uses the ladder operators of Eq. (17.66). Use of the ladder operators also 
has the advantage that it yields explicit expressions for the /;, Y eigenfunctions. Since the 
direction in which ladder operators connect states in an /3, Y diagram is general, we can 
draw a picture that summarizes their properties. Such a picture is called a root diagram; 
that for SU(3) is shown in Fig. 17.8. 


Example 17.7.5 GENERATORS FOR DIRECT PRODUCTS 


If we apply an operation R depending on a parameter ¢ to a product of basis functions for 
different particles, each function will transform according to its representation, which we 
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presently assume to be the fundamental representation: 
R(Wivj2) = (URW) (URY;2)) 
= (#5 y,(1)) (8 y;02)) 
= el SOHO (152), 


where the notation is supposed to indicate that S(1) acts only on particle 1 and S(2) acts 
only on particle 2 (this can be arranged by an appropriate definition of the direct-product 
matrices and the operators to which they correspond). The important point here is that 
because generators appear in an exponent, a product of single-particle operations can be 
obtained using a sum of single-particle generators. This observation is a generalization 
of our earlier writing of resultant multiparticle angular momenta as sums of individual 
contributions, and enables us to write, for three-quark products, expressions such as 


ly =14) +142) +140); 

















so, for example (dropping the proportionality constant C7), 
l_u(1)u(2)u(3) = d(1)u(2)u(3) + u(1)d(2)u(3) + u()u(2)d(3). 


Suppressing the explicit particle numbers, this can be shortened to |_uuu = duu + udu + 
uud. Corresponding results apply to all the other ladder operators and to all three-quark 
products, and to the application of the diagonal generators, such as 


Ss u(1)u(2)u(3) = (S3(Lu(1)) w@u(3) + w(1) ($3 @)u(2)) w) 


3 
+ w(1)u(2)($33)u(3)) = 5 uCu(2)uG), 


or S3 uuu = 3 uuu, equivalent to assigning 13 = 3 to uuu. Similar analysis can yield 
results such as /3 = 4 for uud, or (283 //3)dss = —dss, showing that dss has Y = —1. 


We are now ready to return to the verification of Eq. (17.68). 


Example 17.7.6 DECOMPOSITION OF BARYON MULTIPLETS 


There are 27 three-quark products, which, using the analysis of Example 17.7.5, have the 
(13, Y) values shown here. 


(+3, 1) uuu (0, 0) uds, dus, usd, dsu, sud, sdu 
(+5, 1) uud, udu, duu (—1,0) dds, dsd,sdd 

(3, 1) udd, dud, ddu (+3, —1) USS, SUS, SSU 

(-3,1) ddd (—4,-1) dss, sds,ssd 

(+1, 0) UUS, USU, SUU (0, —2) SSS 
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We can find the irreducible representations in our direct product in a relatively mechani- 
cal way. We start by placing the 27 quark products at their coordinate positions in an 13, Y 
diagram. We note that the point (3, 1) is occupied by only one product, uuu, so it must, by 
itself, be a member of some irreducible representation of SU3. Starting there, we may take 
steps in any of the directions indicated in the root diagram, providing there is a function 
at each point to which we move. Since all we are doing is identifying possible states, we 
need not make any sophisticated computations as we proceed. Since uuu is completely 
symmetric under permutations, the basis function at each point will be a symmetric sum 
of the products at each point reached. When we have reached all the points, we will have 
identified a total of 10 basis functions, all members of the same irreducible representation, 
the one we called 10. This set of 10 basis functions is called a decuplet. The graph for 
these basis functions, called a weight diagram, is shown in Fig. 17.9. 

At the points where there was more than one quark product, there will be products left 
over after accounting for 10; if we want to be quantitative, they will be linear combinations 
that are orthogonal to the symmetric forms used in 10. Continuing with either of the two 
leftover functions at G, 1), we may construct another set of basis functions from the left- 
overs; these sets will contain eight members, with the weight diagram shown in Fig. 17.10. 
(There are only seven points still occupied in the diagram, but the one at (0,0) yields two 
different functions when approached from different directions; the function obtained when 
(0,0) is reached horizontally can, via a subgroup analysis, be related to the members of its 
representation at (+1, 0). These points are elaborated in Exercise 17.7.4.) After account- 
ing for these two octets, corresponding to representations 8, there will be one completely 
antisymmetric function left at (0,0); it is a basis for 1. a 


Both the representations 8 and 10 are relevant for particle physics. The rationalization 
of the similar-mass baryon octet was based on assignment of those particles to members 
of 8, with the small mass differences associated with the breaking of the strong-interaction 
symmetry by a weaker force which retained some of the SU(2) subgroup symmetries, and 
by the (weaker still) electromagnetic forces that also broke the SU(2) symmetries. The 
identification of the octet members with the basis functions of 8 is included in Fig. 17.10, 
and the energetics of the overall situation is indicated schematically in Fig. 17.11. 














FIGURE 17.9 Weight diagram, baryon decuplet. The symbols at the various points are 
the names of particles assigned to the basis. 
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FiGURE 17.10 Weight diagram, baryon octet. 








Agtrong Htrong Astrong 
+H, medium 


hs H medium 





ae Hetectromagnetic 





FiGURE 17.11. Baryon mass splitting. 


The representation 10 provides an explanation for the set of 10 excited-state baryons 
whose weight diagram is shown in Fig. 17.9. When Gell-Mann fitted the then existent 
data to the decuplet representation, the Q7 particle had not yet been discovered, and its 
prediction and subsequent detection provided a strong indication of the relevance of SU(3) 
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to physics. Yet another instance of the importance of SU(3) is provided by the existence 
of a meson octet (displaced by one unit in Y relative to the primary baryon octet). 

Finally, we caution the reader that the foregoing discussion is by no means complete. 
It does not take full account of fermion antisymmetry requirements, the consideration of 
which led to the SU(3)-color gauge theory of the strong interaction called quantum chro- 
modynamics (QCD). QCD also, at a minimum, involves the group SU(3). We have also 
left much unsaid about subgroup decompositions of the overall symmetry group, qualita- 
tively alluded to in the discussion supporting Fig. 17.9. 

To keep group theory and its very real value in proper perspective, we should empha- 
size that group theory identifies and formalizes symmetries. It classifies (and sometimes 
predicts) particles. But apart from saying, e.g., that one part of the Hamiltonian has SU(2) 
symmetry and another part has SU(3) symmetry, group theory says nothing about the par- 
ticle interaction. Likewise, a spherically symmetric Hamiltonian has (in ordinary space) 
SO(3) symmetry, but this fact tells us nothing about the radial dependence of either the 
potential or the wave function. 


Exercises 

17.7.1. Determine three SU(2) subgroups of SU(3). 

17.7.2 Prove that the matrices U(n) (unitary matrices of order n) form a group, and that SU(n) 
(those with determinant unity) form a subgroup of U(7). 

17.7.3. Using Eq. (17.56) for the matrix elements of SU(2) corresponding to rotations about 
the coordinate axes, find the matrix corresponding to a rotation defined by Euler angles 
(a, 6, y). The Euler angles are defined in Section 3.4. 

17.7.4 For a product of three quarks, the member of SU(3) representation 10 with (/3, Y) = 


(+3, 1) is uuu. 


(a) Apply operators in the root diagram for SU(3), Fig. 17.8, to obtain all the remain- 
ing members of the decuplet comprising the representation 10. 


(b) The two representations 8 can be chosen to have for 13 = 5 Y = | the respective 


members 1 (5, 1) = (ud — du)u and (5, 1) = 2uud — udu — duu. Briefly 
explain why this choice is possible. 

(c) Using the operators in the root diagram and the above Wi (5, 1), find expressions 
for Wi (—3, 1), Wi(-1, 0), Wi, 0), Wi (—3, -1), and 1 (5, -1). 

(d) Taking each of the six yw functions you now have, apply an operator that will 
convert it into ¥ (0,0). Show that you obtain exactly two linearly independent 
w (0, 0), thereby justifying the claim that the y are an octet at the points shown 
in Fig. 17.10. 

(e) Show that the octet built starting from W2(5, 1) is linearly independent from that 
built from yy. 

(f) Find the wave function w (0, 0) that is linearly independent of all the ¥ (0, 0) func- 
tions found in parts (a)-(e). It is the sole member of the representation 1. 
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LORENTZ GROUP 


It has long been accepted that the laws of physics should be covariant, meaning that they 
should have forms that are (1) independent of the origin of the coordinates used to describe 
them (leading from an isolated system to the law of conservation of linear momentum); 
(2) independent of the orientation of our coordinates (leading to a conservation law for 
angular momentum); and (3) independent of the zero from which time is measured. Most 
of our experience suggests that velocities should add like ordinary vectors; for example, 
a person walking toward the front of a moving train would, as viewed by a stationary 
observer, have a net velocity equal to the sum of that of the train and the walker’s veloc- 
ity relative to the train. This rule for velocity addition is identified as Galilean, and is 
correct in the limit of small velocities. However, it is now known that transformations 
between coordinate systems with a constant nonzero relative velocity must lead to a non- 
intuitive velocity addition law that causes the velocity of light to be the same as measured 
by observers in all coordinate systems (reference frames). As Einstein showed in 1905, 
the necessary velocity addition law could be obtained if coordinate-system changes were 
described by Lorentz transformations. Einstein’s theory, now known as special relativ- 
ity (its extension to curved space-time to describe gravitation is called general relativity), 
also helped to complete an understanding of the way in which electric and magnetic phe- 
nomena become interconverted when charges at rest in one coordinate system are viewed 
as moving in another. 

The transformations that are consistent with the symmetry of space-time form a group 
known as the inhomogeneous Lorentz group or the Poincaré group. The Poincaré group 
consists of space and time displacements and all Lorentz transformations; here we shall 
only discuss the Lorentz transformations, which by themselves form the Lorentz group, 
sometimes for clarity referred to as the homogeneous Lorentz group. 


Homogeneous Lorentz Group 


Lorentz transformations can be likened to rotations that affect both the spatial and the time 
coordinates. An ordinary spatial rotation about the origin, in which (x1, x2) > (xj, x4), has 
the property that the length of the associated vector is unchanged by the rotation, so that 
x7 +.2x5 =x/? +x. But we now consider transformations involving a spatial coordinate 
(let’s choose z) and a time coordinate t, but with z? — c?t* = z/* — c?t’”, so that the 
velocity of light, c, computed for travel from the origin (0, 0) to (z, t) will be the same as 
that for travel from the origin to (z’, t’). We are therefore abandoning the notion that the 
time variable is universal, assuming instead that it changes together with changes in the 
spatial variable(s) in a way that keeps the velocity of light constant. We also see that it is 
natural to rescale the t coordinate to x9 = ct, so that the invariant of the transformation 
becomes 27 — ie 

Let’s now examine a situation in which the coordinate system is moving in the +z 
direction at an infinitesimal velocity c 6p (so that a Galilean transformation applies to z): 


z’=z—c(dp)t =z — (6p) x0. 
But we assume that t also changes, to 


t'=t—a(dp)z, or x5=xo9 —ac(dp)z, 
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with a chosen to keep z7 — ae constant to first order in 5p. The value of a that satisfies this 
requirement is a = +1/c, so our infinitesimal Lorentz transformation is 


()-(n VC) feels IC) 


To identify this equation in terms of a generator, we note that 


al? i(5p)S ey, 17.69 
a md ey as (17.69) 


where o is a Pauli matrix. Extending now to a finite velocity just as we did for ordinary 
rotations in the passage from Eq. (17.35) to Eq. (17.36), we have an expression that is 
similar to Eq. (17.39), except that we now have o instead of 0, while in place of g we 
now have ip. The result is 


U(p) = exp(iplio1]) = cos(ip) + io 1 sin(ip) = cosh(p) — 01 sinh(p) 


cosho -—sinhp 
= . 17.70 
( —sinho  cosho ( ) 


While 69 was an infinitesimal velocity (in units of c), it does not follow that p, the result 
of repeated dp transformations, is proportional to the resultant velocity in the final trans- 
formed coordinates. However, from the equation z’ = zcosh p — xo sinh p, we identify the 
resultant velocity as v= c sinh p/coshp = ctanhp. 

Summarizing, and introducing the symbols usually used in relativistic mechanics, we 
identify 


1 
p= = tanho=f, coshhp = ————=y, sinho= fy. (17.71) 
c 


JP 
The range of p (sometimes called the rapidity) is unlimited, but tanh p < 1, thereby show- 
ing that c is an upper limit to v (which cannot be reached for finite p). 

A Lorentz transformation that does not also involve a spatial rotation is known as a 
boost or a pure Lorentz transformation. Successive boosts can be analyzed using the 
group property of the Lorentz transformations: A boost of rapidity p followed by another, 
of rapidity p’, both in the z direction, must have transformation matrix 


Aue coshp’ —sinhp’ cosho —sinhp 
p')U(p) = 
—sinhp’  coshp’ 


—sinhp  cosho 


cosh p’ coshp + sinh p’ sinh —cosh p’ sinh p —sinh p’ cosh p 
sinh p’cosho —coshp’sinhp — sinhp’ sinh p + cosh p’ cosh p 


( cosh(p +p’) —sinh(p + p’) 


=U(p+ p’), 
—sinh(p +p’) cosh(p + p’) ee 





864 


Chapter 17 Group Theory 


showing that the rapidity (not the velocity) is the additive parameter for successive boosts 
in the same direction. The result we have just obtained is obvious if we write it in the 
generator notation; it is 


U(p')U(p) = exp(—p'o1) exp(—po1) = exp(—(p' + p)o1) =U(p' +p). (17.72) 


Because of the group property, successive boosts in different spatial directions must 
yield a resultant Lorentz transformation, but the result is not equivalent to any single boost, 
and corresponds to a boost plus a spatial rotation. This rotation is the origin of the Thomas 
precession that arises in the treatment of spin-orbit coupling terms in atomic and nuclear 
physics. A good discussion of the Thomas precession frequency is in the work by Goldstein 
(Additional Readings). 


Example 17.8.1 ADDITION OF COLLINEAR VELOCITIES 


Let’s now apply Eq. (17.72) to two successive boosts in the z direction, identifying each by 
its individual velocity (v’ for the first boost, v” for the second), or equivalently 6’ = v'/c, 
BB" =v"/c. The corresponding rapidities will be denoted p’ and p”, so 


/ / 


v v 
tanh’ = p’=—,  tanhp” = p” = — 
c c 


The resultant of the two successive boosts will have rapidity p = p’+ p”, and will therefore 
be associated with a resultant velocity v satisfying tanh(p’ + 0”) = v/c = B. From the 
summation formula for the hyperbolic tangent, we have 


v’ 





yp! 
tanh p’ + tanh p” aa "+ p" 
~ = B =tanh(p’ + p") = ci = GS _ aa : (17.73) 
c 1 + tanh po’ tanh p” vv 1+ p’p” 
Lee ae 
Cc 

Equation (17.73) shows that when v’ and v” are both small compared to c, the velocity 
addition is approximately Galilean, becoming exactly Galilean in the small-velocity limit. 
But as the individual velocities increase, their resultant decreases relative to their arithmetic 
sum, and never exceeds c. This behavior is to be expected, since (for real arguments) the 
hyperbolic tangent cannot exceed unity. a 


Minkowski Space 


If we make the definition x4 = ict, the formulas we have just obtained, and many others as 
well, can be written in a systematic form that does not have minus signs explicitly present 
for the time coordinate. Then Lorentz transformations act like rotations in a space with 
basis (x1, x2, x3,.x4), and the conserved quantity is Xe + a7 + a + oe This approach is 
appealing and is widely used. 

An alternative way to proceed, which has the disadvantage of being a bit more cum- 
bersome, but with the advantage of providing a framework suitable for the extension to 
general relativity, is to use real coordinates (as was done in the preceding subsection), but 
to handle the difference in behavior of the spatial and time coordinates by introducing a 
suitably defined metric tensor. One possibility (for basis x9 = ct, x1, x2, x3), where x; 
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(i = 1, 2, 3) are Cartesian spatial coordinates, is to use the Minkowski metric tensor, first 
introduced in Example 4.5.2, 


1 0 0 0 
0 -l 0 0 
MV) _ 
(3) = (pv) = 0 0 —-1 0}? (17.74) 
0 0 0 -l 


where it is understood that Greek indices run over the four-index set 0 to 3, and that 
displacements are rendered as scalar products of the form x“g,,,x’" or x,g""x/,, where 
the repeated indices are understood to be summed (the Einstein summation convention). 
Note that because all the analysis in this section is in Cartesian coordinates, the distinction 
between contravariant and covariant indices is limited to the insertion of minus signs in 
some elements of products that involve the metric tensor. 

As was pointed out in Example 4.6.2, this metric tensor sometimes appears with the 
signs of all its diagonal elements reversed. Either choice of signs is valid and yields proper 
results for problems of physics if used consistently, but trouble can arise if material from 
inconsistent sources is combined. The cited example also indicates how Maxwell’s equa- 
tions can be written in a manifestly covariant form. 

Note that the transformation matrices S and U must be mixed tensors, since they convert 
a vector (whether covariant or contravariant) into another vector of the same variance 
status. Since for a pure boost these matrices are symmetric, either index can be deemed to 
be covariant (the other then being contravariant). 


Exercises 


17.8.1 Show that in 3 + 1 dimensions (this means three spatial dimension plus time), a boost 
in the xy plane at an angle 6 from the x direction has, in coordinates (xo, x1, x2, x3), 
the generator 


0 cos@ sind 0 

- cos@ 0 0 0 
sind 0 0 0 

0 0 0 0 


17.8.2 (a) Show that the generator in Exercise 17.8.1 produces a Lorentz transformation 
matrix for rapidity p given by 
cosh p —cos 6 sinh p —sind@ sinh op 
U(p: 6) = —cos 6@ sinh p sin? 6 + cos*@coshp cosé@ sin6(cosh p — 1) 
—sin@sinhp cos@sin@(coshp—1) cos*6+ sin? @ cosh p 


0 0 0 


r OC CO CO 


Note. This transformation matrix is symmetric. All single boosts (in any spatial 
direction) have symmetric transformation matrices. 


(b) Verify that the transformation matrix of part (a) is consistent with (1) rotating 
the spatial coordinates to align the boost direction with a coordinate axis, (2) 
performing a boost in the direction of that axis using Eq. (17.70), and (3) rotat- 
ing back to the original coordinate system. 
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Obtain the Lorentz transformation matrix for a boost of finite amount p’ in the x direc- 
tion followed by a finite boost ” in the y direction. Show that there are no values of p 
and @ that can bring this transformation to the form given in Exercise 17.8.2. 


LORENTZ COVARIANCE OF MAXWELUS EQUATIONS 


We start our discussion of Lorentz covariance by recalling how the magnetic and electric 
fields B and E depend on the vector and scalar potentials A and ¢: 


B=V xA, 

(17.75) 
E= ie Vv 
at v 


Restricting consideration to situations where ¢ and jz have their free-space values ¢9 and 
Lo (with e949 = 1/c7), it can be shown that A and g form a four-vector whose components 
A (in contravariant form) are 


Ai=ceoA;, i=1, 2, 3, 








(17.76) 
a = €0P. 
We now form the tensor F“* with elements 
aA* a Al 
Pee 3, (17.77) 
OX OX 
which we evaluate (consistently with our choice of Minkowski metric) using 
a) i) ) a a) ) a a 
a = , = , = : (17.78) 
dxo cot Ox] ox 0x2 dy 0X3 Oz 
The resulting form for F“*, known as the electromagnetic field tensor, is 
0 =H, =<Ey =£, 
a Ey 0 —cB, cBy 
FM = €9 “|. (17.79) 
Ey cB, 0 —cB, 


E, —cBy cB, 0 


The quantity F’* is, as its name implies, a second-order tensor that must have the 
transformation properties associated with the Lorentz group. We know this to be the case 
because we constructed F/* as a linear combination of terms, each of which was the 
derivative of a four-vector; differentiation of a vector (in a Cartesian system) generates a 
second-order tensor. 

An interesting aside to the above analysis is provided by the discussion of Maxwell’s 
equations in the language of differential forms. In Example 4.6.2 we showed that the dif- 
ferential form 


F=—E,dt \dx — Eydt \dy — E,dt \dz+ By dy \dz+ Bydz A dx + B,dx Ady 
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was a starting point from which Maxwell’s equations could be derived; we now observe 
that the individual terms of this differential form correspond to the elements of the tensor 
under discussion here. 


Lorentz Transformation of E and B 


Returning to the main matter of present concern, we now apply a Lorentz transformation 
to F“*. For simplicity we take a pure boost in the z direction, which will have matrix 
elements similar to those of Eq. (17.70); using the notations introduced in Eq. (17.71), our 
transformation matrix can be written 


y 0 O —By 
0 1 0 0 
U= (17.80) 
0 oOo 1 0 
—By 0 0 y 


Noting that we must apply our Lorentz transformation to both indices of F“*, and keeping 
in mind that U is symmetric and, as pointed out in Section 17.8, a mixed tensor, we can 
write 


F’=UFU, (17.81) 


where F and F’ are both contravariant matrices. If we now compare the individual elements 
of F’ with those of F, we obtain formulas for the components of E’ and B’ in terms of the 
components of E and B. For the transformation at issue here, the results are (where v is the 
velocity of the transformed coordinate system, in the z direction, relative to the original 
coordinates): 


Ey. =y (Ex — BcBy) =y (Ex — vBy), 


El, = y (Ey + BcBy) =y (Ey + vB), (17.82) 
EL=E,, 
ce =Y/Y (2 = ’5;) = (2x = Ey), 
Cc c 
B,=y (4 — PE) =y (4 2 Ex), (17.83) 
° Cc c 


We can generalize the above to a boost v in an arbitrary direction: 


E’=y(E+vxB)+(1—y)Ep, 


vxE (17.84) 


B=y(#- a )+a-ye, 
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where E, = (E- ¥)¥ and B, = (B- ¥)V¥ are the projections of E and B in the direction of v. 
In the limit v < c, these equations reduce to 





E’=E+v~xB, 

17.85) 
; vVxE ( 
B=B- a 


Note that the coordinate transformation changes the velocity with which charges move 
and therefore changes the magnetic force. It is now clear that the Lorentz transformation 
explains how the total force (electric plus magnetic) can be independent of the reference 
frame (i.¢c., the relative velocities of the coordinate systems). In fact, the need to make the 
total electromagnetic force independent of the reference frame was first noted by Lorentz 
and Poincaré. This was where Lorentz transformations were first recognized as relevant for 
physics, and that may have provided Einstein with a clue as he developed his formulation 
of special relativity. 


Example 17.9.1 TRANSFORMATION TO BRING CHARGE TO REST 


Consider a charge g moving at a velocity v, with v < c. By giving the coordinate system 
a boost v, we transform to a frame in which the charge is at rest and experiences only an 
electric force gE’. But since the total force is independent of the reference frame, it is also 
given, according to Eq. (17.86), as 


F=q(E+vxB), (17.86) 


which is just the classical Lorentz force. a 


The ability to write Maxwell’s equations in a tensor form that gives the experimentally 
observed results under Lorentz transformation is an important achievement because it guar- 
antees that the formulation is consistent with special relativity. This is one of the reasons 
that modern theories of quantum electrodynamics and elementary particles are often writ- 
ten in this manifestly covariant form. Conversely, the insistence on such a tensor form 
has been a useful guide in the construction of these theories. 

We close with the following general observations: 


The Lorentz group is the symmetry group of electrodynamics, of the electroweak 
gauge theory, and of the strong interactions described by quantum chromo- 
dynamics. It appears necessary that mechanics in general have the symmetry of 
the Lorentz group, and that requirement corresponds to the general applicability of 
special relativity. With respect to electrodynamics, the Lorentz symmetry explains 
the fact that the velocity of light is the same in all inertial frames, and it explains 
how electric and magnetic forces are interrelated and yield physical results that 
are frame-independent. While a detailed study of relativistic mechanics is beyond 
the scope of this book, the extension to special relativity of Newton’s equations of 
motion is straightforward and leads to a variety of results, some of which challenge 
human intuition. 
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Exercises 
17.9.1. Apply the Lorentz transformation of Eq. (17.80) to F“* as given in Eq. (17.79). Verify 
that the result is a matrix F’ whose elements confirm the results given in Eqs. (17.82) 
and (17.83). 
17.9.2 Confirm that the generalization of Eqs. (17.82) and (17.83) to a boost corresponding to 


17.10 


an arbitrary velocity v is properly given by Eq. (17.84). 


SPACE GROUPS 


Perfect crystals exhibit translational symmetry, meaning that they can be considered as a 
space-filling array of parallelepipeds stacked end-to-end and side-to-side, with each con- 
taining an identical set of identically placed atoms. A single parallelepiped is referred to as 
the unit cell of the crystal; a unit cell can be specified by giving the vectors that define its 
edges. Calling these vectors h;, hz, h3, equivalent points in any two unit cells are separated 
from each other by vectors 


b=n,h, +n2h2 + n3h,, 


where n1, 12, 3 can be any integers (positive, negative, or zero). The set of these equiva- 
lent points is called the Bravais lattice of the crystal. 

A Bravais lattice will have a symmetry that depends on the angles and relative lengths 
of the lattice vectors; in three dimensions there are 14 different symmetries possible for 
Bravais lattices. There are 32 3-D point groups that are symmetry-compatible with at least 
one Bravais lattice; these are called crystallographic point groups to distinguish them 
from the infinite number of point groups that can exist in the absence of any compatibility 
requirement. 


Example 17.10.17 = TitiNGa Ftoor 


To understand the notion of crystallographic point group, consider what would happen 
(in two dimensions) if we try to tile a floor with identical tiles in the shape of a regular 
polygon. We will have success with squares and triangles, and even with hexagons. These 
work because an integer number of tiles can be placed so that they have vertices at the 
same point. A triangle has an internal angle of 60°, so six of them can meet at a point; 
similarly, four squares can meet at a point, as can three hexagons (internal angle 120°). 
But we cannot tile with regular pentagons (internal angle 108°) or any regular polygon 
with more than six sides. 


Combining Bravais lattices and compatible point groups, there is a total of 230 different 
groups in 3-D that exhibit translational symmetry and some sort of point-group symmetry. 
These 230 groups are called space groups. Their study and use in crystallography (e.g., to 
determine the detailed structure of a crystal from its x-ray scattering) is the topic of several 
of the larger books in the Additional Readings. 

Systems with periodicity in only one or two dimensions also exist in nature; some lin- 
ear polymers are 1-D periodic systems; surface systems and single-layer arrays such as 
graphene (a macroscopic hexagonal array of carbon atoms) exhibit periodicity in two 
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dimensions. There is even a kind of translational symmetry that involves elements that 
form helical structures. The recognition of this type of symmetry in crystallographic stud- 
ies of DNA was the key contribution leading to the discovery that DNA existed as a dou- 
ble helix. 


Additional Readings 


Buerger, M. J., Elementary Crystallography. New York: Wiley (1956). A comprehensive discussion of crystal 
symmetries. Buerger develops all 32 point groups and all 230 space groups. Related books by this author 
include Contemporary Crystallography. New York: McGraw-Hill (1970); Crystal Structure Analysis. New 
York: Krieger (1979) (reprint, 1960); and Introduction to Crystal Geometry. New York: Krieger (1977) 
(reprint, 1971). 

Burns, G., and A. M. Glazer, Space Groups for Solid-State Scientists. New York: Academic Press (1978). A 
well-organized, readable treatment of groups and their application to the solid state. 

de-Shalit, A., and I. Talmi, Nuclear Shell Model. New York: Academic Press (1963). We adopt the Condon- 
Shortley phase conventions of this text. 

Falicov, L. M., Group Theory and Its Physical Applications. Notes compiled by A. Luehrmann. Chicago: Uni- 
versity of Chicago Press (1966). Group theory, with an emphasis on applications to crystal symmetries and 
solid-state physics. 

Gell-Mann, M., and Y. Ne’eman, The Eightfold Way. New York: Benjamin (1965). A collection of reprints of 
significant papers on SU(3) and the particles of high-energy physics. Several introductory sections by Gell- 
Mann and Ne’eman are especially helpful. 

Goldstein, H., Classical Mechanics, 2nd ed. Reading, MA: Addison-Wesley (1980). Chapter 7 contains a short 
but readable introduction to relativity from a viewpoint consonant with that presented here. 

Greiner, W., and B. Miiller, Quantum Mechanics Symmetries. Berlin: Springer (1989). We refer to this textbook 
for more details and numerous exercises that are worked out in detail. 

Hamermesh, M., Group Theory and Its Application to Physical Problems. Reading, MA: Addison-Wesley (1962). 
A detailed, rigorous account of both finite and continuous groups. The 32 point groups are developed. The 
continuous groups are treated, with Lie algebra included. A wealth of applications to atomic and nuclear 
physics. 

Hassani, S., Foundations of Mathematical Physics. Boston: Allyn and Bacon (1991). 


Heitler, W., The Quantum Theory of Radiation, 2nd ed. Oxford: Oxford University Press (1947), reprinting, 
Dover (1983). 


Higman, B., Applied Group-Theoretic and Matrix Methods. Oxford: Clarendon Press (1955). A rather complete 
and unusually intelligible development of matrix analysis and group theory. 


Jackson, J. D., Classical Electrodynamics, 3rd ed. New York: Wiley (1998). 
Messiah, A., Quantum Mechanics, vol. Il. Amsterdam: North-Holland (1961). 


Panofsky, W. K. H., and M. Phillips, Classical Electricity and Magnetism, 2nd ed. Reading, MA: Addison- 
Wesley (1962). The Lorentz covariance of Maxwell’s equations is developed for both vacuum and material 
media. Panofsky and Phillips use contravariant and covariant tensors. 


Park, D., Resource letter SP-1 on symmetry in physics. Am. J. Phys. 36: 577-584 (1968). Includes a large selec- 
tion of basic references on group theory and its applications to physics: atoms, molecules, nuclei, solids, and 
elementary particles. 

Ram, B., Physics of the SU(3) symmetry model. Am. J. Phys. 35: 16 (1967). An excellent discussion of the 
applications of SU(3) to the strongly interacting particles (baryons). For a sequel to this see R. D. Young, 
Physics of the quark model. Am. J. Phys. 41: 472 (1973). 

Tinkham, M., Group Theory and Quantum Mechanics. New York: McGraw-Hill (1964), reprinting, Dover 
(2003). Clear and readable. 

Wigner, E. P., Group Theory and Its Application to the Quantum Mechanics of Atomic Spectra (translated by J. 
J. Griffin). New York: Academic Press (1959). This is the classic reference on group theory for the physicist. 
The rotation group is treated in considerable detail. There is a wealth of applications to atomic physics. 
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CHAPTER 18 


MorE SPECIAL FUNCTIONS 


In this chapter we shall study four sets of orthogonal polynomials: Hermite, Laguerre, and 
Chebyshev! of the first and second kinds. Although these four sets are of less importance 
in mathematical physics than are the Bessel and Legendre functions of Chapters 14 and 15, 
they are used and therefore deserve attention. For example, Hermite polynomials occur in 
solutions of the simple harmonic oscillator of quantum mechanics and Laguerre polynomi- 
als in wave functions of the hydrogen atom. Because the general mathematical techniques 
duplicate those used for Bessel and Legendre functions, the development of these functions 
is only outlined. Detailed proofs are for the most part left to the reader. 

The sets of polynomials treated in this chapter can be related to the more general quan- 
tities known as hypergeometric and confluent hypergeometric functions (solutions of 
the hypergeometric ODE). For practical reasons we defer most discussion of these rela- 
tionships until we have had an opportunity to define the hypergeometric functions and 
the associated nomenclature. The benefit accruing from the connection to hypergeomet- 
ric functions is that the hypergeometic recurrence formulas and other general properties 
translate into useful relationships for the polynomial sets that we are presently studying. 

We conclude the chapter with a short section on elliptic integrals. Although the impor- 
tance of this subject has declined as the power of computers has increased, there are some 
physical problems for which they are useful and it is not yet time to eliminate them from 
this text. 


HERMITE FUNCTIONS 


We start by identifying Hermite functions as solutions of the Hermite ODE, 


Hy (x) — 2xH,,(x) + 2nH, (x) = 0. (18.1) 


'This is the spelling choice of AMS-55 (for the complete reference, see Abramowitz in Additional Readings). However, various 
names, such as Tschebyscheff, are encountered in the literature. 
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Here n is a parameter. When n > 0 is integral, this ODE will have a solution H,,(x) which 
is a polynomial of degree n; these solutions are known as Hermite polynomials. 

In the presence of appropriate boundary conditions, the Hermite ODE is a Sturm- 
Liouville system; polynomial solutions to such ODEs was the topic of Section 12.1. We 
showed there, in Example 12.1.1, that the Hermite polynomials could be generated from 
their Rodrigues formula, Eq. (12.17), and that, in turn, a Rodrigues formula can be 
obtained from the underlying ODE. We also showed in that same section how we can 
go from the Rodrigues formula to a generating function for a given polynomial set, pre- 
senting in Table 12.1 a list of generating functions that could be found in this way. That 
list included the following generating function for the Hermite polynomials: 


CO 
. t” 
ga, )ae lt Sm, (x) (18.2) 
n=0 : 


Here we elect not to depend on the analysis of Section 12.1 but rather to take the view- 
point that Eq. (18.2) can be regarded as a definition of the Hermite polynomials, thereby 
making the present analysis completely self-contained. Accordingly, we proceed by verify- 
ing that these polynomials satisfy the Hermite ODE, have the expected Rodrigues formula, 
and exhibit the other properties that can be developed starting from the generating function. 


Recurrence Relations 


Note the absence of a superscript, which distinguishes Hermite polynomials from the unre- 
lated Hankel functions. From the generating function we find that the Hermite polynomials 
satisfy the recurrence relations 


An+1(x) = 2x Ay (x) — 2n Hn—1 (x) (18.3) 
and 
Hi (x) = 2nH,_1(x). (18.4) 


The Hermite polynomials were used in Example 12.1.2 as a detailed illustration of the 
method for obtaining recurrence formulas from generating functions; we summarize the 
process here. By differentiating the generating function formula with respect to t we obtain 





dg —1?42tx = i" 
ap = (et + ve =U Ani), or 
n=0 Nn. 
oo pnt 90 t oo t 
2) An @)—— +2) An) =D) Ant). 
n=0 : n=0 , n=0 : 


Because this equation must be satisfied separately for each power of ft, we arrive at 
Eq. (18.3). Similarly, differentiation with respect to x leads to 


ag Dna yo pe) 
a Die Ae = 2) An(x——, 
n=0 . n=0 , 


from which we can obtain Eq. (18.4). 
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The Maclaurin expansion of the generating function 


ee) 2)\n 
; 2tx —t 
grees SEES SiGe Pen (18.5) 


n! 
n=0 
gives Ho(x) = 1 and A(x) = 2x, and then the recursion formula, Eq. (18.3), permits 


the construction of any H,,(x) desired. For convenient reference the first several Hermite 
polynomials are listed in Table 18.1 and presented graphically in Fig. 18.1. 


Special Values 


Special values of the Hermite polynomials follow from the generating function for x = 0: 


eo 2\n oO n 
2 (-t*) t 
eS ) ) Hn (0)— 
n=0 


no 
n=0 





Table 18.1 Hermite Polynomials 





Ho(x) =1 
A (x) = 2x 

Ho(x) = 4x2 —2 

H3(x) = 8x? — 12x 

Ha4(x) = 16x4 — 48x? + 12 

Hs (x) = 32x> — 160x3 + 120x 

H6(x) = 64x® — 480x4 + 720x2 — 120 




















FiGURE 18.1 Hermite polynomials. 
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that is, 


7 (2n)! 
A, (0) = (-1) a Aln+1(0)=0, n=0,1,---. (18.6) 


We also obtain from the generating function the important parity relation 


A, (x) = (—1)" An (—x) (18.7) 
by noting that Eq. (18.3) yields 
oo (—t)" o° t" 
g(—x,—1) = ) | An(—x) —— = 8.) =) n(x). 
n=0 n=0 


Hermite ODE 


If we substitute the recursion formula Eq. (18.4) into Eq. (18.3), we can eliminate the index 
n — 1, obtaining 


An+1 (x) = 2x Ay (x) — F(x). 


If we differentiate this recurrence relation and substitute Eq. (18.4) for the index n + 1, we 
find 


Hj 4) = 2(n + 1) Hy (x) = 2p () + 20 Hh (x) — HY (a), 


which can be rearranged to the second-order Hermite ODE, Eq. (18.1). This completes 
the process of establishing the identification of the Hermite polynomials obtained from the 
generating function as solutions of the Hermite ODE. 


Rodrigues Formula 


A simple way to generate the Rodrigues formula for the Hermite polynomials starts from 
the observations that 


g(x, t= et t2re 6X” p34)" and £ en tx) me en ty 
, ot Ox 
We note that n-fold differentiation of the generating function formula, Eq. (18.2), followed 
by setting t = 0, yields 
o” 
— g(x,t) =H, (x), 
or” t=0 " 





and we can therefore obtain the Rodrigues formula as 


t=0 dx" t=0 


o” 20" —7 49 
n(x) = 37 80) ee a 


=e Ai 
t=0 t 








Q” 
= (-1)"e" a en, (18.8) 
Xx 
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Series Expansion 


Starting from the Maclaurin expansion, Eq. (18.5), we can derive our Hermite polynomial 
H,,(x) in series form: Using the binomial expansion of (2x — t)”, we initially get 


gt aais = 3 Tan _ t= ne 3 ‘(; Jenn" 


v= 0” 





-~y rts (—1)8(v +8)! (2x) 
= avs)! (v—s)!s! ; 


Changing the first summation index from v ton = v +, and noting that this change causes 
the s summation to range from zero to [n/2], the largest integer less than or equal to n/2, 
our expansion takes the form 





tn [n/2] (= 1)5n! 
et 249 n—2s 
=a (LG = h 2x) : 
n= 0” s=' 
from which we can read out the formula for H,,: 
[n/2] 
(—1)%n! 9 

H, = ——_——_ 2x)", 18.9 
n(x) dX nai” (18.9) 


Finally, we note that H,(x) can be written as a Schlaefli integral. Comparing with 
Eq. (12.18), 


! 
H,(x) = per pt te tan dt. (18.10) 


Orthogonality and Normalization 


The orthogonality of the Hermite polynomials is demonstrated by identifying them as aris- 
ing in a Sturm-Liouville system. The Hermite ODE, however, is clearly not self-adjoint, 
but can be made so by multiplying it by exp(—x) (see Exercise 8.2.2). With exp(—x7) as 
a weighting factor, we obtain the orthogonality integral 


CO 
/ Hm (0) Hy (x) e~* dx =0, meén. (18.11) 
—0o 
The interval (—oo, 00) is chosen to obtain the Hermitian operator boundary conditions (see 
Section 8.2). 
It is sometimes convenient to absorb the weighting function into the Hermite polynomi- 
als. We may define 


n(x) =e /? Hy (x), (18.12) 
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with y, (x) no longer a polynomial. Substitution into Eq. (18.1) yields the differential equa- 
tion for g(x), 
Of (x) + 2n +1 —x?)pn(x) =O. (18.13) 
Equation (18.13) is self-adjoint, and the solutions g,(x) are orthogonal on the interval 
—oo <x <oo with a unit weighting function. 

We still need to normalize these functions. One approach is to combine two instances of 
the generating function formula (using variables s and ft), after which we multiply by en" 
and integrate over x from —oo to oo. These steps yield 

st n 
/ Ce gr te yg — [: x? Hn (x) Hy (x)dx. (18.14) 
—oo m,n=0 Recie 
We next note that the exponentials on the left-hand side of Eq. (18.14) can be combined 
into e2'e-@-8-)”, after which the integral can be evaluated: 
CO [o,@) 
7 =x? —s2425x 127 4210x 2st / —(x—s—1)? 1/2 2st 
e-e- e dx =e e : dx=m''“e™’.” 
—~oo —oo 
Inserting this result into Eq. (18.14) after expanding it in a power series, we get 
eee 1/2 Qo 20st" 
Se —x? 
=a ae ae TET ayy Hn (x) Hn (x) dx. 
m,n=0 —oo 
By equating coefficients of equal powers of s and t, we both confirm the orthogonality and 
obtain the normalization integral 
7 2 
[ [no] dx =2"7'/2n1, (18.15) 
—Co 
Exercises 
18.1.1 Assume that the Hermite polynomials are known to be solutions of the Hermite ODE, 


Eq. (18.1). Assume further that the recurrence relation, Eq. (18.3), and the values of 
H,,(0) are also known. Given the existence of a generating function 


80,1) = > he, 


n=0 


(a) Differentiate g(x, ft) with respect to x and using the recurrence relation develop a 
first-order PDE for g(x, ft). 





18.1.2 


18.1.3 
18.1.4 


18.1.5 


18.1.6 
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(b) Integrate with respect to x, holding ¢ fixed. 
(c) Evaluate g(0, ft) using the known values of H,,(0). 
(d) Finally, show that g(x, t) = exp(—t? + 2tx). 


In developing the properties of the Hermite polynomials, start at a number of different 
points, such as: 


Hermite’s ODE, Eq. (18.1), 

Rodrigues’s formula, Eq. (18.8), 

Integral representation, Eq. (18.10), 

Generating function, Eq. (18.2), 

Gram-Schmidt construction of a complete set of orthogonal polynomials over 
(—o0o, 00) with a weighting factor of exp(—x7) (Section 5.2). 


age? oe 


Outline how you can go from any one of these starting points to all the other points. 
Prove that |H,(x)| < | Hy, (ix)|. 


Rewrite the series form of H,, (x), Eq. (18.9), as an ascending power series. 


=a S22 2! 
ANS. Hon(x) = (—1) 2 1)?5 (2x) eGo 
(2n + 1)! 


_¢_4yn _1)5 2st ee 
Haye) di en (25+ 1)\(n—s)! 


(a) Expand x?" ina series of even-order Hermite polynomials. 
(b) Expand x*’*+! ina series of odd-order Hermite polynomials. 


r 








Qr __ ay ; Fan (x) 
so ae 22r &* (2n)\(r — n)! 
ora Ort VIC Aang (x) 7 
(b) x = eral ) Qnt Die —wl’ r=0,1,2,..5. 


n=0 


Hint. Use a Rodrigues representation and integrate by parts. 


Show that 
CO 
Pe 27n!/(n/2)!, n even 
(a) i A, (x)exp I-5| dx = 
2 0, n odd. 
—oo 
or) 0, n even 


2 
(b) J sticnex0] | dx=) to 


“oe G+ pai" 
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18.1.7 


18.2 


(a) Using the Cauchy integral formula, develop an integral representation of H,, (x) 
based on Eq. (18.2) with the contour enclosing the point z = —x. 


ANS. Hy) = “g oF A 
: in (X = 57° G@+ayel Ve 


(b) Show by direct substitution that this result satisfies the Hermite equation. 


APPLICATIONS OF HERMITE FUNCTIONS 


One of the most important applications of Hermite functions in physics arises from the 
fact that the functions gy, (x) of Eq. (18.12) are the eigenstates of the quantum-mechanical 
simple harmonic oscillator, which describes motion subject to a quadratic (also known as 
a harmonic or a Hooke’s-law) potential. This fact causes Hermite polynomials not only 
to appear in elementary quantum-mechanics problems, but also in analyses of the vibra- 
tional states of molecules, where the lowest-order description of the interatomic potential 
is harmonic. In view of the importance of these topics, we now proceed to examine them 
in some detail. 


Simple Harmonic Oscillator 


The quantum mechanical simple harmonic oscillator is governed by a Schrédinger equa- 
tion of the form 
h? d?w(z) ik 
Zz 
2m dz 2 
where m is the mass of the oscillator, k is the force constant for its Hooke’s law force 
directed toward z = 0, ft is Planck’s constant divided by 27, and E is an eigenvalue giv- 
ing the energy of the oscillator. Equation (18.16) is to be solved subject to the boundary 
condition that y(z) vanish at z = too. It is convenient to make a change of variable that 
eliminates the various constants from the equation, and we therefore make the substitutions 


_ Re ke Oi Ss Pe _ Tk 
any 2 ~2Vm"* mde 2m de’ 
which converts Eq. (18.16) into 


LEO) oF sey ats 18.17 
5 ae + Oa) = g(x), (18.17) 


with boundary conditions at x = too. The eigenvalue A in this equation is related to E by 


k 
ER, —. (18.18) 
m 


The solutions of Eq. (18.17) that satisfy the boundary conditions can now be identified as 
given by Eq. (18.13), and we can identify 4, the eigenvalue of Eq. (18.17) corresponding 
to g(x), as having the value n + 5: Turning to Eq. (18.12), and expressing x in terms 


*W(z) = Ev), (18.16) 
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of the original variable z, the eigenstates of Eq. (18.16) can be characterized (including a 
normalization constant N,,) as 


~(az)?/2 1y, | * ne 
Wn(Z) =Nne ~~! Ay(az), Exn=(n+5)h ra o = Gan (18.19) 
with n restricted to the integer values 0, 1, 2, ---. The normalization constant can be 


deduced from Eq. (18.15). Noting that the normalization integral is to be over the vari- 
able z, we find it to be 


a 1/2 
N= (sermi) ae) 

It is of interest to examine a few of the eigenstates of this oscillator problem. For refer- 
ence, a classical oscillator of mass m and force constant k will have the angular oscillation 
frequency 


k 


Welass = ; 
m 
and can have an arbitrary energy of oscillation, while our quantum oscillator is restricted to 
oscillation energies (n + 5 )h@eclass, with n a nonnegative integer. We note that the quantum 
oscillator must have at least the total energy shoclass; this is usually referred to as its 
zero-point energy and is a consequence of the fact that its spatial distribution must be 
described by a wave function of finite extent. 

The three lowest-energy eigenfunctions of the quantum oscillator are shown in Fig. 18.2. 
We note that these wave functions predict a position distribution that extends to oo, 
albeit with exponentially decaying amplitude for larger |z|. The corresponding classical 
oscillator will have excursions in z that are strictly bounded by kz2,,,/2 = E, where E 
can be assigned any value greater than or equal to zero. We have marked in Fig. 18.2 
the excursion range of a classical oscillator with an energy equal to the eigenvalue of the 
quantum oscillator; note that the exponential decay of the quantum wave function begins 
at the ends of the classical range. 


Operator Approach 


While the analysis of the preceding subsection is straightforward and provides a complete 
set of eigenstates for the simple quantum oscillator, additional insight can be obtained 
by an alternative approach that uses the commutation and other algebraic properties of 
the quantum-mechanical operators. Our starting point for this development is the recogni- 
tion that the differential operator —d*/dx? of Eq. (18.17) arose as a representation of the 
dynamical quantity p*, where (in units with A = 1) p <> —id/dx. Then our Schrédinger 
equation of Eq. (18.17) can be written 

pe+x 
2 


where 7 is the Hamiltonian operator, with eigenvalues 2. 


Ho = 





p=», (18.21) 
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FiGURE 18.2. Quantum mechanical oscillator wave functions. The heavy bar on the 
x-axis indicates the allowed range of the classical oscillator with the same total energy. 


The key to an approach starting from Eq. (18.21) is that x and p satisfy the basic com- 
mutation relation 


[x, p]=xp — px =i, (18.22) 


a result discussed in detail in the analysis leading to Eq. (5.43). In fact, if we proceed 
under the assumption that Eq. (18.22) is all that we know about x and p, there is the 
additional advantage that any results we obtain will be more general than those from our 
original oscillator problem in ordinary space. This observation underlies much recent work 
in which physical theory has evolved in more abstract directions. 

With a knowledge of the way in which angular momentum theory was developed in 
terms of raising and lowering operators, one can easily motivate a somewhat similar 
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procedure here, by defining the two operators 


1 ; ¢ 1 ; 
ig rin) a =a ip). (18.23) 
Since we typically use a to denote a constant, we remind the reader that in the present 
development it is an operator (involving x and d/dx). With suitable Sturm-Liouville 
boundary conditions, x and p are both Hermitian. But the presence of the imaginary unit 
i causes a not to be Hermitian, and (as indicated by the notation) changing the sign of the 
term ip converts a into its adjoint, a’. 
Our first use of Eq. (18.23) is to form a‘a and aa’: 


a 


ata == (e— ip) tip) ==? + p>) +4 (ep — px) = H+ She, plan 
=> P P =5 P 5 P— Px)= 5 »Pl= >? 
FS tipylx ip) = 20? + py — 4 =H “Ix, plants 
aa =z \Xx l ieee | = rr —- (rp —- px) = = =X => = 
5) Pp P 2 P 5 P— Pp 5) »~P 5) 
From these equations we obtain the useful formulas 
H=a'a+3, (18.24) 
[a,a']=aa‘' —a‘a=1, (18.25) 


and therefrom 


[H, a] =[ata+ 5 a] =[a‘a, a] = a‘aa — aa‘a = (a'a —aa‘')a = —a. (18.26) 


Applying [H, a] to an eigenfunction y, with eigenvalue 4, (assumed not yet known), we 
write 


LH, algn = H(agn) — AH Gy = H(agn) — An(AGn) = —(AGn), 
which we easily rearrange to the form 
H(agn) = (An — 1) (a@n). (18.27) 


Equation (18.27) shows that we can interpret a as a lowering operator that converts an 
eigenfunction with eigenvalue A, into another eigenfunction that has eigenvalue 4, — 1. 
A similar analysis, left to the reader, shows that from the commutator [H, a‘] we find a‘ 
to be a raising operator, according to 


H(a' gn) = (An + 1)(a" gn). (18.28) 


These formulas show that, given any eigenfunction g,, we can construct a ladder of eigen- 
states whose eigenvalues differ by unit steps. The only limitation that would terminate the 
construction of an infinite ladder would be the possibility that for some g,, either ag, or 
a‘ Qn might be zero. 

To investigate the circumstances under which ag, might vanish, let’s form the scalar 
product (ag~y|agn). We find 


(aQnlagn) = (nla*algn) = (PnlH — 41Gn) = (PnlAn — 41Gn)- (18.29) 
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Equation (18.29) shows that only if, = 5 will we have ag, = 0. That equation also shows 
that if An» < 5 we have the mathematical inconsistency that the norm of ag, is predicted 
to be negative. These observations together imply that the only possible values of A, are 
positive half-integers, as otherwise by repeated application of the lowering operator a we 
can move to a A value prohibited by Eq. (18.29). We leave to the reader the verification 
that the application of the raising operator a’ to any valid ¢, produces a new eigenfunction 
a’ Qn with a positive norm. 

Our overall conclusion is that any system with a Hamiltonian of the form given by 
Eq. (18.21), whether or not represented by an ODE in ordinary space, will have eigenstates 
whose eigenvalues form a ladder of unit spacing, with the smallest eigenvalue equal to 7 
This makes it natural to label the states g, by integers n > 0, and therefore to write 


Hon = An@n, An =n+35, n=0,1,2--., (18.30) 


in agreement with what we found from our original approach; compare Eq. (18.19). 

Before leaving this exercise in operator algebra, it may be worth noting that the notion 
of raising and lowering operators also arises in contexts where the states thereby reached 
can be interpreted as those containing different numbers of particles (or quasiparticles, a 
physics jargon that refers to objects, such as photons, whose population is easily changed 
by interaction with their surroundings). In such contexts, a raising operator is then often 
referred to as a creation operator, with a lowering operator then called an annihilation 
(or sometimes a destruction) operator. Obviously these terms have to be interpreted with 
an understanding of the underlying physics. 

Returning to the description of p as a differential operator, the equation agp = 0 can 
be identified as a differential equation satisfied by the ground (lowest-energy) state of our 
oscillator. More specifically, 


V2 ago =(x +ip)go = E +1 (-i2)| go = E + <| go =0, (18.31) 
dx dx 


which has the advantage of being a first-order ODE. This ODE is separable, and can be 
integrated: 
d 2 
a dx, Ingo= aes Inco, go= ce * ?, 
£0 2 
in agreement with our previous analysis. 
Eigenstates for arbitrary n can now be generated by repeated application of a‘ to go. 
Doing so is left as an exercise. 


Molecular Vibrations 


In the dynamics and spectroscopy of molecules in the Born-Oppenheimer approximation, 
the motion of a molecule is separated into electronic, vibrational, and rotational motion. In 
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treating the vibrational motion, the departure of nuclei from their equilibrium positions is to 
lowest order described by a quadratic potential, and the resulting oscillations are identified 
as harmonic. These harmonic motions can be treated as coupled simple harmonic oscil- 
lators, and we can decouple the individual nuclear motions by making a transformation to 
normal coordinates, as was illustrated in Example 6.5.2. In this harmonic oscillation limit, 
the vibrational wave functions have the form given in the preceding subsection, and the 
computation of properties associated with these wave functions then involve integrals in 
which products of Hermite functions appear. 
The simplest integrals occurring in vibrational problems are of the form 


Cc 
/ xte* Ay, (x) Hm (x)dx. 
—oo 


Examples for r = 1 and r = 2 (with n = m) are included in the exercises at the end of this 
section. A large number of other examples can be found in the work by Wilson, Decius, and 
Cross.” Some of the vibrational properties of molecules require the evaluation of integrals 
containing as many as four Hermite functions. In the remainder of this subsection we 
illustrate some of the possibilities and the associated mathematical procedures. 


Example 18.2.7. = THREEFOLD HERMITE FORMULA 


Consider the following integral involving three Hermite polynomials 


[ee 


h= | eH, (8) Hing (0) Hin (Xx) dx, (18.32) 


—co 


where N; > 0 are integers. The formula (due to E. C. Titchmarsh, J. Lond. Math. Soc. 23: 
15 (1948); see Gradshteyn and Ryzhik, p. 804, in Additional Readings) generalizes the I> 
case needed for the orthogonality and normalization of Hermite polynomials. To start, we 
note that the integrand of 3 will be even if the index sum m; + m2 + m3 is even, and odd 
if that index sum is odd, so /3 will vanish unless m; + m2 + mz3 is even. In addition, we 
see that if the product Hj, Hn, 1s expanded and written as a sum of Hermite polynomials, 
the resulting polynomial of largest index will be Hin,+m,, 80 13 will vanish due to orthog- 
onality unless m, + mz 1s at least as large as m3. This condition must continue to hold 
if the roles of the m; are permuted; a convenient way of summarizing these observations 
is to state that the m; must satisfy a triangle condition. Both the even index sum and the 
triangle condition parallel similar conditions on integrals of Legendre polynomials which 
we encountered in Section 16.3 and discussed in detail at Eq. (16.85). 





25. B. Wilson, Jr., J. C. Decius, and P. C. Cross, Molecular Vibrations, New York: McGraw-Hill (1955), reprinted, Dover 
(1980). 
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To derive /3, we start with the product of three generating functions of Hermite polyno- 





‘ j 2 : 
muials, multiply by e~* , and integrate over x: 
CO 3 CO 
= —x? 2xtj—t7 (th +53—x)°42(thth thts) 
23= e I] e jdx= e ; dx 
=o j=l —oo 
2 — 2 N! +3 ny-tn3 nit 
_ ttottt+ht3) __ 2 oe 2 : N2+N3 .N{+N3 Ni +nN2 
ee —N% Ly NI mining! 2 3 
N=0 n1,n2,n3>0 
nj+n2+n3=N 


(18.33) 


In reaching Eq. (18.33), we recognized the x integration as an error integral, Eq. (1.148), 
and then expanded the resulting exponential, first as a power series in w = 2(tyfo + 13 + 
t2t3), and then expanding the powers of w by the generalization of the binomial theorem 
given as Eq. (1.80). Note that the index for the power of #;t; in the polynomial expansion 
was designated nz, where i, j, k are (in some order) 1, 2, 3. 

We next expand the generating functions in terms of Hermite polynomials and set the 
result equal to a slightly simplified version of the expression just obtained for Z3: 


oo pit pia yi3 oe 
Z3= 2 12 38 fe Hin, (%) Hiny (%) Hing (x) dx 
m,!mz2! m3! 


m1,m2,m3=0 Jag 


N ,f2+n3 ny +n3 ny +n 
2” ty ty b 





=/r 


nj,n2,n3=0 


(18.34) 


ni!n2!n3! 


with N =n; +n2 +73. In Eq. (18.34) we now equate the coefficients of equal powers of 
the tj, finding that my =n2 +73, m2 =n, +73, m3 =n, +N, that 


my +m2a+m 
N= 1 2 ag 


2 

and that nj = N — m,n = N — m2, n3 = N — m3. From the coefficients of f)"'1)"13°, 
we obtain the final result 

_ Ji 2% mi!my!ms3! 
(N= my)! (N — m3)! (N — m3)! 
Equation (18.35) explicitly reflects the necessity of the triangle condition. If it is not satis- 
fied but the sum of the m; is even, at least one of the factorials in the denominator of 
Eq. (18.35) will have a negative integer argument, thereby causing /3 to be zero. The 
requirement that the sum of the m; be even is not explicit in the form of Eq. (18.35), but 
the formula for /3 is restricted to that case because the right-hand side of Eq. (18.34) only 
contains terms in which the sum of the powers of the ¢; is even. a 





it (18.35) 


Hermite Product Formula 


The integrals J, with m > 3 can be obtained in closed form, but as finite sums. The starting 
point for that analysis is a formula for the product of two Hermite polynomials due to 
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E. Feldheim, J. Lond. Math. Soc. 13: 22 (1938). To derive Feldheim’s formula, we can 
start from a product of two generating functions, written as 


ee of (oh gh 
er atn)—h a= 3 An, (x) Am (x) 1.2 
m! m7! 


m,,m2=0 





oO n © v 
= e2tith)-G+h) 92 yn o (ty as 3 (2t1f2)" 


! v! 
n=0 v=0 


Applying the binomial expansion to (t; + f2)” and then comparing like powers of t; and 
tz in the two lines of the above equation, we find 


min(m,,m2) 


Bena. So Britny av@) 
v=0 





m!m2!2” m, +m2—2v 
v!(m, +m —2v)! mi—v 


min(m,,m2) 
v,(m1\ (m2 
= » Farin a@)2 ui : )( i ) (18.36) 
v=0 


For v = 0 the coefficient of Hy,+ , is obviously unity. Special cases, such as 
Hy = Hy) +2, Hi Hy = H3+4H1, Hy = Hy t+ 8H2 +8, M1 H3 = Hy + 6H2 


can be derived from Table 13.1 and agree with the general twofold product formula. 

The product formula has been generalized to products of m > 2 Hermite polynomials, 
thereby providing a new way of evaluating the integrals [,,. For details we refer the reader 
to work by Liang, Weber, Hayashi, and Lin.* 


Example 18.2.2 — FouRFOLD HERMITE FORMULA 


An important application of the Hermite product formula is a newly reported evaluation of 
the integral 74 containing a product of four Hermite polynomials. The analysis is that of 
one of the present authors and his colleagues.* 

The integral we are about to study is of the form 


(ee) 


kee / en, (x) Hin X) Hing) Hing( x) dx. (18.37) 


—co 


It is convenient to order the indices of the Hermite polynomials so that m, > m2 > m3 > 
m4. Our approach will be to apply the product formula to Hin, Hin, and to Am, Am,, thereby 





3K. K. Liang, H. J. Weber, M. Hayashi, and S. H. Lin, Computational aspects of Franck-Condon overlap intervals. In Pandalai, 
S. G., ed., Recent Research Developments in Physical Chemistry, Vol. 8, Transworld Research Network (2005). 
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initially obtaining 
min(m,,m2) ie 7 min(m3,m4) on a 
EOE OO 
pu=0 a M v=0 . . 
oe) 
_ 
x / e* Ain, +mo—2(X) Aing+ma—2v (x) dx. (18.38) 
—cC 
Invoking the orthogonality of the H,,, with the weighting factor shown, the integral in 
Eq. (18.38) can be evaluated, yielding 
CO 
= 33 
/ ge Ain +m.—2 (*) Amy 4m4—2v (x)dx 


—oo 


= fx 2344-2 On + ma — 2V)! bmy429—2emg-+m4—2v- (18.39) 


The Kronecker delta in Eq. (18.39) limits the value of yu to the single value, if any, that 
satisfies 
_ Mm, +m2—m3—m4, 


= ; eg (18.40) 


so the double summation collapses to a single sum over v. Moreover, when the powers of 
2 in Eqs. (18.38) and (18.39) are combined, their resultant is 2”, where 





FY gem le OLE (18.41) 
2 
We now rewrite Eq. (18.38), removing the 44 summation and assigning jz the value from 
Eq. (18.40), writing the binomial coefficients in terms of their constituent factorials, and 
introducing M wherever it will result in simplification. We reach 








a J 2” (m3 + m4 — 2v)!my!m2! m3! ma! 
7) ETT G7, my, v)!(M mg v)! (m3 v)! (mg — vl vl 
(18.42) 





Vv 


This formula for Z4 will only be valid when the sum of the m; is even, equivalent to the 
requirement that M (and therefore also jz) be integral. If the sum of the m; is odd, then J4 
will have an odd integrand and will vanish by symmetry. The summation in Eq. (18.42) 
will be over the nonnegative integral values of v for which none of the factorials in the 
denominator of that summation has a negative argument. Note that there will be no value of 
v that satisfies this condition ifm, > m2 -+m3+m4, because M —m, will then be negative, 
and then /4 = 0. Thus we have a generalization of the triangle condition that applied to the 
threefold Hermite formula: If the largest of the m; is greater than the sum of the others, the 
H,, of smaller m cannot combine to yield a Hermite polynomial of sufficiently large index 
to avoid an orthogonality zero. 

Further examination of the factorials in the denominator of Eq. (18.42) reveals that the 
lower limit of the summation will (if m, < m2 + m3 + m4) always be v = 0; note that 
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M — m3 — mg will always be nonnegative. The upper limit of the summation will be the 
smaller of m4 and M — m,. a 


The Hermite polynomial product formula can also be applied to products of Hermite 
polynomials with a different exponential weighting function than in the examples we have 
presented. To evaluate such integrals we use the generalized product formula in conjunc- 
tion with the integral (see Gradshteyn and Ryzhik, p. 803, in the Additional Readings), 








co 
2,2 gntn 2 m+n+1 
a") a (m+n) /2 
/ e a® Ay (x) An (x)dx = pore ia yr m (me ) 
—0o 
min(m,n) ( m)y( n) ae v 
TM) yA) yp 
18.4 
. me l—m-n (x=) , en) 
v=0 yp! 
2 v 


instead of the standard orthogonality integral for the product of two Hermite polynomials. 
The quantity (—m),, is a Pochhammer symbol, and causes the v summation in Eq. (18.43) 
to be a finite sum. The summation can also be identified as a hypergeometric function; see 
Exercise 18.5.11. The process we have sketched yields a result that is similar to [,, but 
somewhat more complicated. We omit details. 

The oscillator potential has also been employed extensively in calculations of nuclear 
structure (nuclear shell model), as well as in quark models of hadrons and the nuclear 
force. 


Exercises 
d n 
18.2.1 Prove that (2 _ ) 1= 4H, (x). 
dx 
Hint. Check out the cases n = 0 and n = 1 and then use mathematical induction 
(Section 1.4). 
O° 2 
18.2.2 Show that / x"e~* H,(x)dx =0 form aninteger, O<m<n-—l. 
—0o 
18.2.3. The transition probability between two oscillator states m and n depends on 


Cc 
; xe7* Hy (x) Hin (x)dx. 
—oo 
Show that this integral equals 2 !/22"—'n! 8).n—1 + 0/22" (n + 1)! bm.n41- This result 


shows that such transitions can occur only between states of adjacent energy levels, 
m=n+l. 





Hint. Multiply the generating function, Eq. (18.2), by itself using two different sets 
of variables (x,s) and (x,t). Alternatively, the factor x may be eliminated by the 
recurrence relation, Eq. (18.3). 
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18.2.4 


18.2.5 


18.2.6 


18.2.7 


18.2.8 


00 1 
Show that / x2e-* Hy (x) Hy (x)dx = 2 /22"n! (: es 5): 


—oo 

This integral occurs in the calculation of the mean-square displacement of our quantum 
oscillator. 

Hint. Use the recurrence relation, Eq. (18.3), and the orthogonality integral. 


Evaluate 
CO 


2,—x? 
xe * Ay (x) Am (x)dx 
—0oo 
in terms of n and m and appropriate Kronecker delta functions. 


ANS. 2) Ta? Gand Dal Ging 22 AG ED) Oa EO oe OA Os 


00 0, p>r 
Show that / 


—0o 


xte* Ay (x) n+ p(x)dx = 
Ynatr)!, p=r, 


with n, p, and r nonnegative integers. 


Hint. Use the recurrence relation, Eq. (18.3), p times. 


Ay (x) 


: —x2/2 
With w(x) =e-* / Gin’ 


verify that 








—i 1 d 
ain(s) = SP = (x4 2) vat) =n trl, 
i i 1 d 
atyn(s) = SP = T(x A) vals) = (0+ DY na 


Note. The usual quantum mechanical operator approach establishes these raising and 
lowering properties before the form of Wy, (x) is known. 


(a) Verify the operator identity 


4i d x2] d x2 
— — x ex y 
es acm aati 4 A ce le) 


(b) The normalized simple harmonic oscillator wave function is 





x2 


Wn (x) = (t'/?2"n!)~ 1? exp |- - Hy, (x). 


Show that this may be written as 


d\" x2 
— (2 1/29n,4\-1/2 [ . eten 
Wn (x) = (0 °'72"n!) (: <) exp | 5 | 
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Note. This corresponds to an n-fold application of the raising operator of Exer- 
cise 18.2.7. 


LAGUERRE FUNCTIONS 


Rodrigues Formula and Generating Function 


Let’s start from the Laguerre ODE, 
xy" (x) + (1 —x)y'(x) + ny(x) = 0. (18.44) 


This ODE is not self-adjoint, but the weighting factor needed to make it self-adjoint can 
be computed from the usual formula, 


1 1- 
AC aie exp| 


Given w(x), we may now use the method developed in Section 12.1 to obtain a Rodrigues 
formula and generating function for the Laguerre polynomials. Letting L,(x) denote the 
nth Laguerre polynomial, the Rodrigues formula is (apart from a scale factor) given by 


Eq. (12.9): 
d n 
(2) wom 


where p(x) is the coefficient of y” in the ODE. Inserting the expressions for w(x) and 
p(x), and inserting a factor 1/n! to bring the Laguerre polynomials to their conventional 
scaling, the Rodrigues formula takes the more complete and explicit form, 


Ly(x) = < (+) (ete). (18.46) 





x 1 - 
as| =— exp(Inx —x)=e™. (18.45) 
x 


Ln(x) = 





w(x) 


dx 


A generating function can now be written as a sum of contour integrals of the Schlaefli 
type, as in Eq. (12.25): 





1 ype w(z)[p(z)]" 


w(x) ay Qni Je (z—x)t! 


where the contour surrounds the point x and no other singularities. Specializing to our 
current problem, and noting that the coefficient cy has the value 1/n!, this formula becomes 


e <(tz)” e* etdz = tz \" 
ay Bre G2) = — § >, (18.47) 
Qi omar fa Guay * ~ Ini Ie (z—x) oer aoa 
We now recognize the n summation as a geometric series, so our generating function 
becomes 


g(x,t)= So L a(x)" = 


t=0 


dz, 











g(x, t= 





e* e *dz 
g(x, t= — (18.48) 
2mi Jo z—x—1z 
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Our integrand has a simple pole at z = x /(1 —f), with residue e~*/“'—) /(1 —t), and g(x, f) 
reduces to 





ee x/(1-t) e xt/(1—t) oo 
= — = 4 => tat (18.49) 
n=0 





the form given in Table 12.1. 

Not all workers define Laguerre polynomials at the scale chosen here and represented by 
the specific formulas in Eq. (18.46) and (18.49). However, our choice is probably the most 
common, and is consistent with that in AMS-55 (see Abramowitz in Additional Readings). 


Properties of Laguerre Polynomials 


By differentiating the generating function in Eq. (18.45) with respect to x and ft, we obtain 
recurrence relations for the Laguerre polynomials as follows. Using the product rule for 
differentiation we verify the identities 


298 _ 


(1-1) a 





(1—x —g(x, 1), (— 138 =1900,0, (18.50) 


Writing the left-hand and right-hand sides of the first identity in terms of Laguerre polyno- 
mials using the expansion given in Eq. (18.49), we obtain 


Yo [n+ DLngi(%) = 2nLn(x) + = DLp-1(x)] 2" 


n 


=U [Gd -x)La) — Ln1@)]e”. 


Equating coefficients of z” yields 
(n+ LD) Lngi(x) = Qn+1—x)Lq(x) — nLn-1(2). (18.51) 


To get the second recursion relation we use both identities of Eqs. (18.50) to verify a third 
identity, 
a a a(t 
ge eens (tg) 


Ox ot ot 





which, when written similarly in terms of Laguerre polynomials, is seen to be equivalent to 
xL} (x) =nL,(x) —nLp_1(X). (18.52) 


To use these recurrence formulas we need starting values. From the Rodrigues formula, 
we easily find Lo(x) = 1 and L1(x) = 1 — x. Applying Eq. (18.51) we continue to Ly, (x) 
with n > 1, obtaining the results given in Table 18.2. The first three Laguerre polynomials 
are plotted in Fig. 18.3. 
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Table 18.2 Laguerre Polynomials 





Lo(x) =1 
Lja~)=-x4+1 

2! Lo (x) =x? —4x +2 

31 L3(x) = —x3 + 9x? — 18x +6 

A! La(x) =x* — 16x? + 72x? — 96x + 24 

5! L5(x) = —x° + 25x4 — 200x3 + 600x? — 600x + 120 

6! L6(x) = x® — 36x> + 450x4 — 2400x3 + 5400x? — 4320x +720 
































FiGuRE 18.3. Laguerre polynomials. 


From the recurrence relations or the Rodrigues formula, we find the the power series 
expansion of L,(x): 














(-1)" : n2 oo n2(n — 1)? = ; 
a a : ee eg nt 
= “ (—1)n! x” - f (—1)"~Sn! x" 
= (n —m)!m!m! “23 (n—s)!(n—s)!s!° (18.53) 


Also, from Eq. (18.49) we find 


1 lo) CO 
g0,t)= ~~ =) =) Laos", 
n=0 n=0 


which shows that at x = 0 the Laguerre polynomials have the special value 


Ln(0) = 1. (18.54) 
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The form of the generating function, that of Laguerre’s ODE, and Table 18.2 all show that 
the Laguerre polynomials have neither odd nor even symmetry under the parity transfor- 
mation x > —x. 

As we already observed at the beginning of this section, the Laguerre ODE is not self- 
adjoint but can be made so by appending the weighting factor e~*. Noting also that with 
this weighting factor, the Laguerre polynomials satisfy Sturm-Liouville boundary condi- 
tions at x = 0 and x = o«, we see that the L,(x) must satisfy an orthogonality condition 
of the form 


[o@) 


[etencotncoras =dmn- (18.55) 
0 
Equation (18.55) indicates that for this interval and weighting factor the Laguerre polyno- 
mials are normalized. Proof is the topic of Exercise 18.3.3. 
It is sometimes convenient to define orthogonalized Laguerre functions (with unit 
weighting factor) by 


Gn (x) =e 7? Ln (x). (18.56) 
Our new orthonormal functions, g, (x), satisfy the self-adjoint ODE 
” , 1 x 
gga) + (0) + (n+ 5 *) n(x) = 0, (18.57) 


and are eigenfunctions of a Sturm-Liouville system on the range (0 < x < oo). 


Associated Laguerre Polynomials 


In many applications, particularly in quantum mechanics, we need the associated Laguerre 
polynomials defined by* 


k = k dé 
Lyx) = (1k Ente). (18.58) 
By differentiating the power series for L,(x) given in Eq. (18.53) (compare Table 18.2), 


we can get the explicit forms shown in Table 18.3. In general, 


n 


Li(x) = }o(-)” 


m=0 


(n+bh! . 
(n—m)!+m)im!- 





, k=O. (18.59) 


One of the present authors> has recently found a new generating function for the associ- 
ated Laguerre polynomials with the remarkably simple form 


Hin=e "d+ => Lay. (18.60) 
n=0 


4 Some authors use iF og) = (dk /dx*)[ Lys (x)]. Hence our Lk (x)= (-DF Sk (a). 
5H. J. Weber, Connections between real polynomial solutions of hypergeometric-type differential equations with Rodrigues 
formula, Cent. Eur. J. Math. 5: 415-427 (2007). 
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Table 18.3 Associated Laguerre Polynomials 





k_ 
iat 
ULE = —x+(k+) 
21 LK = x? —2(k+2)x+(k+ V2 








3! LE = —33 + 3(k + 3)x? — 3(k + 2)ox + (k +13 
4 LE = x4 — A(k +. 4)x3 + 6(K + 3)2 —4(K +. 2)3 + (K+ D4 
51 LE = —x9 + 5(k +5)x4 — 10(k + 4)9x3 + 10(k +.3)3x? — S(k + 2)4x + (k + D5 
61 LE = x6 — 6(k + 6)x> + 15(k + 5)2x4 — 20(k + 4)3x3 + 15(k + 3)4x? 
—6(k +2)5x + (k + Deo 
TILE = —x7 + 1k + T)x® — 21(k + 6)2x° + 35(k + 5)3x4 — 35(k + 4)gx3 
+21(k + 3)5x2 — 1(k + 2)ox +(k+1)7 






































The notations (k + )m are Pochhammer symbols, defined in Eq. (1.72). 


Rather than denims this formula, we verify it by showing that it produces the defining 
relation for the L‘, Eq. (18.58), and is consistent with the previously presented formulas 
for the ordinary Laguerre polynomials (i.e., the L* with k = 0). 

If we multiply both members of Eq. (18.60) by 1 — ¢, the coefficients of t” yield the 
recurrence formula 





t t k 
Le +t ate) oor LEA LETS, (18.61) 


On the other hand, differentiation of Eq. (18.60) with respect to x and writing 





dgi(x, t) yo 
— i — 


te +n =e "+n -e Pate, 
Ox dx 


n 
the coefficients of t” yield a formula for dL!—" (x)/dx, namely (with k = 1 —n) 
dL (x) 
dx 
and substituting the result from Eq. (18.61), we reach 
dLi dL (x) _ yet! 
dy ~“n-1? 


thereby confirming that our generating function yields Eq. (18.58). 

The verification that our generating function is correct is now completed by using it to 
find vos (x), which is the coefficient of tf” in e~* (1 +t)". Using the binomial expansion of 
(1 + 1)” and the Maclaurin series for the exponential, we get 





ST gyal a), (18.62) 


(18.63) 


n 


exe. n oa “  (-D™! a, 
a= >(, ») a “2 Gmina 


m=0 





in agreement with Eq. (18.53). 
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We can also confirm the series expansion given as Eq. (18.59) for L It is the coefficient 
of t” in e~*(1 + 1)**”, obtained in a manner similar to the procedure we just carried out 
for a 

The generating function provides convenient routes to other properties of the associated 
Laguerre polynomials. Special values for x = 0 can be obtained from 


1 


I 
Ora 040 => ( rn 
n n=0 Ws 
We therefore have 


Lk) = (" : :| (18.64) 


A formula for recurrence in the index n of 1S (x) can be obtained by differentiating the 
generating function formula with respect to t. Doing so, from the coefficient of t” and 
setting /=k-+n, 


(n+ ILAT (x) = (k +n) Lh (x) — x Lh (x). (18.65) 


Using Eq. (18.61) to raise the upper index in the two terms for which it is k — 1, we find 
after collecting similar terms, 
(n+ 1)LE, (x) — Qn +k +1—x)Li(x) + (n +k) L(x) =0, (18.66) 
a lower-index recurrence formula. 
Finally, returning to Eq. (18.65), differentiating it once with respect to x, and identifying 
/ 
[eet = -L*, we get 


@+h Ell =x ie +i (n+ pit ax El — nL, (18.67) 


A second differentiation brings us to 


x[ut] +a-m[i) se+o [ef =oto[e"] -o-o [it], 
(18.68) 


where the final member of Eq. (18.68) was the result of substituting the derivative of 
Eq. (18.62) with k — k — 1. Using Eq. (18.67) to replace (n + k) [ee] by a form in 
which the upper index is k, we reach an ODE for L® 


27k k 
PLN) gy din) 


k _ 





This ODE is known as the associated Laguerre equation. When associated Laguerre 
polynomials appear in a physical problem it is usually because that physical problem 
involves Eq. (18.69). The most important application is their use to describe the bound 
states of the hydrogen atom, which are derived in upcoming Example 18.3.1. 
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The associated Laguerre equation, Eq. (18.69), is not self-adjoint, but the weighting 
function needed to bring it to self-adjoint form (for upper index k) can be found in the 
usual way: 


1 ae 
we) =~ exp| fas] = xk eo®, (18.70) 


When we also note that Sturm-Liouville boundary conditions are satisfied at x = 0 and 
x = 00, we see that the associated Laguerre polynomials are orthogonal according to the 
equation 


[o.@) 
k)! 
[ecettheoth war = at a (18.71) 
0 





The value of the integral in Eq. (18.71) for m =n can be established using the generating 
function, Eq. (18.58). Doing so is left as an exercise. 

Equation (18.71) shows the same orthogonality interval (0, oo) as that for the Laguerre 
polynomials, but with a different weighting function for each k. We see that for each k the 
associated Laguerre polynomials define a new set of orthogonal polynomials. 

A Rodrigues representation of the associated Laguerre polynomials is useful and can be 
found in various ways. A fairly direct approach is simply to use Eq. (12.9) with p(x) = x, 
the coefficient of the second-derivative term in Eq. (18.69) and the value of wx (x) given 
in Eq. (18.70). The result is 


ex xk n 
n! dx" 
Note that this and all our earlier formulas involving the L‘ (x) reduce properly to corre- 
sponding expressions involving Ly, (x) when k = 0. 
By letting wk (x) = ad ak vas 03 (x), we find that wk (x) satisfies the self-adjoint ODE, 


Un) AVEO) | (2 | nth +I 
nae dx 4 2 








LF@)= (etx), (18.72) 








k2 
=) wk (x) =0. (18.73) 


The wk (x) are sometimes called Laguerre functions. Equation (18.57) is the special case 
k =0 of Eq. (18.73). 
A further useful form is given by defining® 
= k 

& @) =e *? UP TE ), (18.74) 

Substitution into the associated Laguerre equation yields 
d* h(x) Leeched 
dx? 4 2x 








k2-1 
k 
= ) O* (x) = 0. (18.75) 


The * (x) are orthogonal with weighting function x~!. 
The associated Laguerre ODE, Eq. (18.69), has solutions even if 7 is not an integer, but 
they are then not polynomials and diverge proportionally to xke* as x — oo. This fact is 


useful in the following example. 


®This corresponds to modifying the function y in Eq. (18.73) to eliminate the first derivative. 
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Example 18.3.1. = THE HYDROGEN ATOM 


The most important application of the Laguerre polynomials is in the solution of the 
Schrédinger equation for the hydrogen-like atom (H, Het, Li*t, etc.). For a system con- 
sisting of a nucleus of charge Ze fixed at the origin and one electron whose distribution is 
described by a wave function w, this equation is 
Ze 


hz 
Vw w= Ev, (18.76) 
2m 4reor 





in which Z = 1 for hydrogen, Z = 2 for He™, and so on. Separating variables in spherical 
polar coordinates and recognizing that the angular part of the solution to this equation must 
be a spherical harmonic, we set w(r) = R(r)Y ha (6, g) with R(r) satisfying the ODE 





fh? 1d (P2) Ze i LL+)) 


R 
2m r2 dr " dr * 2 


E. (18.77) 
Am €or 2m r 


For bound states, R — 0 as r — ov, and it can be shown that these conditions can only 
be met if E <0. In addition, R must be finite at r = 0. We do not consider unbound 
(continuum) states with positive energy. Only when the latter are included do hydrogenic 
wave functions form a complete set. 

By use of the abbreviations (resulting from rescaling r to the dimensionless radial vari- 
able p) 


8mE }'/? mZe* 
a= os » pHar, A= Fah’ x(p) = RY), (18.78) 
0 
Eq. (13.85) becomes 


1 d d a 1 LIL+1 
ap (2 eae 4 “3 ”) xi) =0. (18.79) 


For our present purposes, it is useful to rewrite the first term of Eq. (18.79) using the 


identity 
me (°) - a (ox) 
p?>dp\' do} pdp? 


and then multiply the resulting equation by p, reaching 














d A 1 L(L+1) 
—, =0. 18.80 
Son+ (4 rl z Jen if ) 
A comparison with Eq. (18.75) for ok (x) shows that Eq. (18.80) is satisfied by 
pxtoyse FF? oh Lett" (0), (18.81) 


where k and n of Eq. (18.75) have been, respectively, replaced by 2L + 1 anda —L-—1. 

The parameter 2 must be restricted to values such that A — L — 1 is both integral and 
nonnegative. If this requirement is violated, ia will diverge too rapidly to permit 
pxX(p) to go to zero at large r, which is required for a bound-state electron distribution. 
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Since we already know that L, a spherical harmonic index, must be integral and nonnega- 
tive, we see that the possible values of A are integers n at least as large as L + 1.’ 

This restriction on A, imposed by our boundary condition, has the effect of quantizing 
the energy. Inserting 4 = n, the definitions in Eqs. (18.78) lead to 


2 


Z2m ( e ) 
a (18.82) 


~ 2n2n2 \ Arreg 





Since our Schrédinger equation implicitly set the potential energy to zero when the electron 
is at an infinite separation from the nucleus, the negative sign reflects the fact that we are 
dealing here with bound states in which the electron cannot escape to infinity. The other 
quantities introduced in Eq. (18.78) can also be expressed in terms of n: 


me? Z 2Z 27 , Am eyh? 
= 23 =—r, withag= aL 
2meqh* n nao nao me 








(18.83) 


The quantity ao, of dimension length, is known as the Bohr radius, and its appearance as 
a scale factor causes the potential energy (for n = 1, the smallest possible value) to have 
an average value corresponding to this electron-nuclear separation. 

Summarizing, the final normalized hydrogen wave function is 





IF eT 
Vat", 9,9) = (%) ea | e Par) LNT! (ar)¥i 6,9). 


(18.84) 


Note that the energy corresponding to w%z~y depends only on n, which is called the 
principal quantum number of this system. Note also that if n is assigned a specific inte- 
gral value, the condition on 4 requires that L <n — 1, thereby explaining the well-known 
pattern of possible hydrogenic energy states: If n = 1, L can only be zero; for n = 2, we 


can have L = 0 or L= 1, etc. | 
Exercises 

18.3.1 Show with the aid of the Leibniz formula that the series expansion of L, (x), Eq. (18.53), 

follows from the Rodrigues representation, Eq. (18.72). 
18.3.2 (a) Using the explicit series form, Eq. (18.53), show that 

Li(0)=—n, LO) = 5n(n—-1). 

(b) Repeat without using the explicit series form of Ly (x). 

18.3.3. Derive the normalization relation, Eq. (18.71) for the associated Laguerre polynomials, 


thereby also confirming Eq. (18.55) for the Ly. 


7This is the conventional notation for A. It is not the same n as the index n in * (x), 
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18.3.4 


18.3.5 


18.3.6 


18.3.7 


18.3.8 


Expand x’ in a series of associated Laguerre polynomials LE), with k fixed and n 
ranging from 0 to r (or to oo ifr is not an integer). 


Hint. The Rodrigues form of Bs (x) will be useful. 


nyk 
ANS. x’ - rb ES — — O0<x <oo. 


Expand e~® ina series of associated Laguerre polynomials L‘ (x), with k fixed and n 
ranging from 0 to oo. 


(a) Evaluate directly the coefficients in your assumed expansion. 
(b) Develop the desired expansion from the generating function. 


1 = a 
ax __ 
ANS. e = Gage (4) L(x), O<x<o. 
n=0 





— 





[o,@) 
Show that / e *x*t TK) L* (x)dx = (2n+k +1). 


0 
Hint. Note that x Lk = (Qn +k + 1)Lk — (n+ k)Lk_, — (+ DLh,). 


Assume that a particular problem in quantum mechanics has led to the ODE 





d*y [k-1 2n+k+4+1 1 
2 2 |e 
dx 4x 2% 4 

for nonnegative integers n,k. Write y(x) as y(x) = A(x) B(x)C(x), with the require- 
ment that 


(a) A(x) be a negative exponential giving the required asymptotic behavior of y(x), 
and 
(b) B(x) bea positive power of x giving the behavior of y(x) forO<x <1. 


Determine A(x) and B(x). Find the relation between C(x) and the associated Laguerre 
polynomial. 

ANS. A(x)=e*/?, Bix)=x®YP, C(x) =L*a). 
From Eq. (18.84) the normalized radial part of the hydrogenic wave function is 


=Liiy* 
Rat) = || a Grrl , (ar), 


in which w = 2Z/nay = 2Zme* /4r eh". Evaluate 


CO 


(a) (r) = [ rRvarRusaryr? dr, 


0 
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[ee 


(b) (r7!) = fr 1 Ruvlar) Ry (arr? dr. 
0 
The quantity (r) is the average displacement of the electron from the nucleus, whereas 
(r—!) is the average of the reciprocal displacement. 


1 


n2ao 


ANS. (r) = 5 [an - Le +, @ he 
Derive a recurrence formula for the hydrogen wave function expectation values: 
s+2 

ne 
with s > —2L —1. 


(1) — Qs +3)ap(r) + 4 fan +1? - + D7] aBer?}) =0, 


Hint. Transform Eq. (18.80) into a form analogous to Eq. (18.73). Multiply by 
p*?u! — co’+!u, with u = p®. Adjust c to cancel terms that do not yield expecta- 
tion values. 


CO 


18.3.10 Show that [> ne? H,(xy)dx =./m n! P,(y), where P, is a Legendre polynomial. 


18.4 


—oco 


CHEBYSHEV POLYNOMIALS 


The generating function for the Legendre polynomials can be generalized to the following 
form: 


qo =Larew" (18.85) 


The coefficients C, (2) (4) are known as the ultraspherical polynomials (also called 
Gegenbauer polynomials). For a = 1/2, we recover the Legendre polynomials; the 
special cases a = 0 and a = 1 yield two types of Chebyshev polynomials that are the sub- 
ject of this section. The primary importance of the Chebyshev polynomials is in numerical 
analysis. 


Type II Polynomials 
With w = 1 and C{) (x) written as Up (x), Eq. (18.85) gives 
eee = Lenco" Ix}<1, |t| <1. (18.86) 


These functions are called type II cee polynomials. Although these polynomials 
have few applications in mathematical physics, one unusual application is in the develop- 
ment of four-dimensional spherical harmonics used in angular momentum theory. 
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Type I Polynomials 
With a = 0 there is a difficulty. Indeed, our generating function reduces to the constant 


1. We may avoid this problem by first differentiating Eq. (18.85) with respect to t. This 
yields 





—a(—2x + 2t) = 
C(x) 1” 1 

2)ja+1 yon n a 
(1 — 2xt + t-)% = 


or 





= lee) (a) 
— se E ae (18.87) 


(1 —2xt +1224! “2| a 


We define CO) as 


CP) 


a 


CO~) = lim (18.88) 
a0 

The purpose of differentiating with respect to t was to get a in the denominator and to 

create an indeterminate form. Now multiplying Eq. (18.87) by 2¢ and adding 1 in the form 

(1 — 2xt + t7)/(1 — 2xt + #7), we obtain 


1-1? on 
SN Oa. 18.89 
rer aaa 4 n @) nee 
We define T,, (x) as 
1, n=0, 
i= (18.90) 


5 CO), n>0. 


Note the special treatment for n = 0. We will encounter a similar treatment of the n = 0 
term when we study Fourier series in Chapter 19. Also, note that C (0) is the limit indicated 
in Eq. (18.88) and not a literal substitution of a = 0 into the generating function series. 


With these new labels, 


1-7 


[o.@) 
Tomer TRO +2 TOO", Ielsh l<l (18.91) 


n=1 


We call 7,,(x) the type I Chebyshev polynomials. Note that the notation and spelling of 
the name for these functions differ from reference to reference. Here we follow the usage 
of AMS-55 (Additional Readings). 
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Recurrence Relations 


Differentiating the generating function, Eq. (18.91), with respect to f and multiplying by 
the denominator, 1 — 2xt + t7, we obtain 


=f={7=x) ro 43 oe Tr wt = (1—2xt+1?) pee (x)! 


n=1 n=1 


2 


[eta —2xnTyt” + ane), 


Il 
= 


n 
from which after several simplification steps we reach the recurrence relation 
Tn41(x) — 2xT, (x) + Th-1(x) =0, n>O0. (18.92) 
A similar treatment of Eq. (18.86) yields the corresponding recursion relation for U,,: 
Un4i(x) — 2xU, (x) + Un-1(x) =0, n>O0. (18.93) 


Using the generating functions directly for n = 0 and 1, and then applying these recur- 
rence relations for the higher-order polynomials, we get Table 18.4. Plots of the T,, and U, 
are presented in Figs. 18.4 and 18.5. 

Differentiation of the generating functions for 7,,(x) and U,,(x) with respect to the vari- 
able x leads to a variety of recurrence relations involving derivatives. For example, from 
Eq. (18.89) we thus obtain 


(ies 2 a ag) 3 Ti (x)t" = 2t rs + 23 nas 
n=1 n=1 
from which we extract the recursion formula 
21a) = Tj OO) = 207, Ta). (18.94) 
Other useful recurrence formulas we can find in this way are 


(1 — x?)T) (x) = —nxTn (x) + nTn—1(x) (18.95) 


Table 18.4 Chebyshev Polynomials: Type I (Left), Type I 





(Right) 

To =1 Uj =1 

Tj =x U; =2x 

Ty = 2x? -1 Uz = 4x2 - 1 

T3 = 4x3 — 3x U3 = 8x3 — 4x 

Ty = 8x4 — 8x2 41 U4 = 16x* — 12x? +1 
Ts = 16x> — 20x3 +. 5x Us =32x° — 32x3 + 6x 








To = 32x® — 48x4 + 18x2 — 1 Us = 64x° — 80x4 + 24x? — 1 
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FiGURE 18.5 The Chebyshev polynomials U;, U2, and U3. 


and 


(1 — x7)U/ (x) = —nxUp (x) + (n + IU p10). (18.96) 


Manipulating a variety of these formulas as in Section 15.1 for Legendre polynomials 
one can eliminate the index n — 1 in favor of 7,” and establish that T,, (x), the Chebyshev 
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polynomial type I, satisfies the ODE 
(1 — x?) Tl" (x) — xT (x) +n? T(x) = 0. (18.97) 
The Chebyshev polynomial of type I, U,(x), satisfies 
(1 = x?)U" (x) — 3xU) (x) + n(n + 2)U, (x) = 0. (18.98) 
We could have defined the Chebyshev polynomials starting from these ODEs, but we chose 
instead a development based on generating functions. 


Processes similar to those used for the Chebyshev polynomials can be applied to the 
general ultraspherical polynomials; the result is the ultraspherical ODE 


2 
(a- ys C(x) — Qa+ ies C(x) +n(n+2a)C(x)=0. (18.99) 
dx dx 
Special Values 


Again, from the generating functions, we can obtain the special values of various polyno- 
mials: 


TD=1, M-D=(-b", 

Ton (0) =(—1)", — Tan41(0) = 0; 
Undy=nt+1, Un(-D=(-)"(n+d, 

Urn (0) = (-1)", — Uan41(0) = 0. 


(18.100) 


Verification of Eq. (18.100) is left to the exercises. 
The polynomials 7, and U,, satisfy parity relations that follow from their generating 
functions with the substitutions t — —t, x — —x, which leave them invariant; these are 


Tn (x) = (—D)" T(x), Un(x) = (—D"Un(—x). (18.101) 


Rodrigues representations of T, (x) and U, (x) are 








—~1)"2 1/2 1— 2)1/2 dq" 
poe = ) : [a rte) (18.102) 
2°T (n+ 5) dx 
and 
ac (-1I)"(n4+ 1a!” d® [a elt (18.103) 
2-+1P(n + 3)(1 — x?)1/? dx” 
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Trigonometric Form 
At this point in the development of the properties of the Chebyshev polynomials it 


is beneficial to change variables, replacing x by cos@. With x = cos@ and d/dx = 
(—1/sin@)(d/d0), we verify that 











d*T, — d°T, dT, dT, 
(1 —x?)—+ = —* — coto—"*,,_ xT =-coto—. 
dx? d@ dé dé 
Adding these terms, Eq. (18.97) becomes 
ar 
Te + nT, =0, (18.104) 


the simple harmonic oscillator equation with solutions cosn@ and sinn@. The special val- 
ues (boundary conditions at x = 0 and 1) identify 


T, = cosn@ = cos(n arccos x). (18.105) 


For n £0 a second linearly independent solution of Eq. (18.104) is labeled 


V, = sinnd = sin(narccos x). (18.106) 
The corresponding solutions of the type II Chebyshev equation, Eq. (18.98), become 
i 1)6 
ae (18.107) 
sin@ 
1)0 
aed ia (18.108) 
sin@ 
The two sets of solutions, type I and type II, are related by 
Va (x) = (1 — x7)'/?Un_1 (2), (18.109) 
Wax) = (1 — x7)? That (x). (18.110) 


As already seen from the generating functions, 7,,(x) and U,(x) are polynomials. Clearly, 
V(x) and W,,(x) are not polynomials. From 


Tn(x) + iV, (x) = cosné +i sinnd 


n 
= (cos +isin6)" = [» +i = x°)] , ixl<l (18.111) 


we can apply the binomial theorem to obtain expansions 
T, (x) =x" — (3)x"2a ae (i) ae) eee (18.112) 


and, for n > 0 


Viij=/ tae [Gua 2 a ae + (18.113) 


1 
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From the generating functions, or from the ODEs, power-series representations are 


[n/2] 








n (n—m-—1)! 9 
T(x) = 1)” 2x)"—-" 18.114 
(x) 2 | Gta (18.114) 
for n > 1, with [n/2] the integer part of n/2 and 
[n/2] — 
Unix) = So(-1)" (nam)! ayy 2m, (18.115) 


\(n— 1 
= m!(n — 2m)! 


Application to Numerical Analysis 


An important feature of the Chebyshev polynomials 7,,(x) with n > 0 is that as x is varied, 
they oscillate between the extreme values 7,, = +1 and 7,, = —1. This behavior is readily 
seen from Eq. (18.105) and is illustrated for 72 in Fig. 18.6. If a function is expanded in 
the T,, and the expansion is extended sufficiently that the contributions of successive T,, are 
decreasing rapidly, a good approximation to the truncation error will be proportional to the 
first 7, not included in the expansion. In this approximation, there will be negligible error 
at the n values of x where T,, is zero, and there will be maximum errors (all of the same 
magnitude but alternating in sign) at the extrema of T,, that fall between the zeros. In that 
sense, the errors satisfy a minimax principle, meaning that the maximum of the error has 
been minimized by distributing it evenly into the regions between the points of negligible 
error. 

















FIGURE 18.6 The Chebyshev polynomial 7}2. 
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Example 18.4.1 MINIMIZING THE MAXIMUM ERROR 


Figure 18.7 shows the errors in four-term expansions of e* on the range [—1, 1] carried 
out in various ways: (a) Maclaurin series, (b) Legendre expansion, and (c) Chebyshev 
expansion. The power series is optimum at the point x = 0 and the error increases with 
increasing values of |x|. The orthogonal expansions produce a fit over the region [—1, 1], 
with the maximum errors occurring at x = +1 and three intermediate values of x. How- 
ever, the Legendre expansion has larger errors at +1 than it has at the interior points, while 
the Chebyshev expansion yields smaller errors at +1 (with a concomitant increase in the 
error at the other maxima) with the result that all the error maxima are comparable. This 
choice approximately minimizes the maximum error. 

















FiGuRE 18.7 Error in four-term approximations to e*: (a) Power series; (b) Legendre 
expansion; and (c) Chebyshev expansion. 


Orthogonality 


If Eq. (18.97) is put into self-adjoint form (Section 8.2), we obtain w(x) = (1 — x?)7!/? 
as a weighting factor. For Eq. (18.98) the corresponding weighting factor is (1 — x7)+!/?, 
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The resulting orthogonality integrals, 


and 


1 0, mA#n, 
[moot —2? dx = Sim=n £0, (18.116) 
“4 qH, m=n=0, 
1 0, mAn, 
J vacovscna -2y1? dx = - m=n#40, (18.117) 
=f 0, m=n=0, 
1 
[emcoureya =)" dx => Bnn, (18.118) 
—l 
J Wmcowncora =2°)'" des - Sra (18.119) 


-1 


are a direct consequence of the Sturm-Liouville theory. The normalization values may best 
be obtained by making the substitution x = cos 0@. 


Exercises 


18.4.1 


18.4.2 


18.4.3 


18.4.4 


By evaluating the generating function for special values of x, verify the special values 
Td) =1,  Tr(-1)=(-1)", Tn 0) =(-1)", Tan 410) = 
By evaluating the generating function for special values of x, verify the special values 
Undy=n+1,  Un(-1)=(-1)"(n +1), Um(0)=(—D", U2 410) = 0 


Another Chebyshev generating function is 


1—xt 
Xn (x)t" t 1. 
1—2xt +72 -> ) ee 


How is X,,(x) related to T,,(x) and U, (x)? 
Given 
(1 — x?)U" (x) — 3xU/ (x) +: n(n + 2)Un (x) =0 
show that V,,(x), Eq. (18.106), satisfies 
(1 — x?) V" (x) — xVi (x) + n?Vp(x) =0, 


which is Chebyshev’s equation. 
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18.4.5 


18.4.6 


18.4.7 
18.4.8 


18.4.9 
18.4.10 
18.4.11 


18.4.12 


18.4.13 


18.4.14 


Show that the Wronskian of 7,,(x) and V,,(x) is given by 


n 


Ty (x) Vy (x) — Ty) Vn (x) = "Gx" 


This verifies that 7, and V, (n #0) are independent solutions of Eq. (18.97). Con- 
versely, for n = 0, we do not have linear independence. What happens at n = 0? Where 
is the “second” solution? 


Show that W, (x) = (1 — ey Ph) is a solution of 
d—- x?) W(x) _ 3x W, (x) +n(n+2)W,,(x) =0. 
Evaluate the Wronskian of U,, (x) and W,(x) = (1 — x7)7!/? 41 (a). 


V(x) = (1 — x2)!/2U,,_1(x) is not defined for n = 0. Show that a second and inde- 
pendent solution of the Chebyshev differential equation for 7,,(x) (n = 0) is Vo(x) = 
arccos x (or arcsinx). 


Show that V,, (x) satisfies the same three-term recurrence relation as T,, (x), Eq. (18.92). 
Verify the series solutions for T,(x) and U;,(x), Eqs. (18.114) and (18.115). 


Transform the series form of T, (x), Eq. (18.114), into an ascending power series. 


ANS. Tay (x) = (—1)"n (—1)” 
0 


m= 


(n+m-—1)! ii 
nora col ee 





n 


2n+1 ‘ (—1)"*"(n +m)! 


2m+1 
2 f=Maaspe 








Tr 41(*) = 
m=0 


Rewrite the series form of U,(x), Eq. (18.115), as an ascending power series. 





— (_1)\n “ m (n +m)! 2m 

ANS. Urq (x) = (—1) 2 om 
n 1 ! 

Uon +1 (x) = (—1)" + 1)” (n +m-+ ) (2x)2+1 





(n —m)!(2m + 1)! 


m=0 
(a) From the differential equation for 7, (in self-adjoint form) show that 


/ dT n(x) dT (x) 


= = (—x?)!?dx=0, m#n. 





-1 
(b) Confirm the preceding result by showing that 


dT, (x) 
dx 





=nUp-1(x). 


The substitution x = 2x’ — 1 converts T,(x) into the shifted Chebyshev polynomials 
T;*(x’). Verify that this produces the shifted polynomials shown in Table 18.5 and that 





18.4.15 


18.4.16 


18.4.17 


18.4.18 
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Table 18.5 Shifted Type I Chebyshev Polynomials 





en 

Tj =2x-1 

TX = 8x? — 8x +1 

TX = 32x3 — 48x? + 18x —1 

Tj = 128x4 — 256x3 + 160x? — 32x + 1 

TE = 512x9 — 1280x4 + 120x3 — 400x? + 50x — 1 

Té& = 2048x® — 6144x9 + 6912x4 — 3584x3 + 840x? — 72x + 1 

















they satisfy the orthonormality condition 
1 
Smnt 


i Ti (x')Tm Lx — x)? dx = ; 


0 


— dno 


The expansion of a power of x in a Chebyshev series leads to the integral 
1 


Inn = [ x") 


-1 


dx 
nf Pa 


(a) Show that this integral vanishes for m <n. 
(b) Show that this integral vanishes for m + n odd. 


Evaluate the integral 


1 


Inn = fence 


dx 
V1 —x2 


for m >n and m +n even by each of two methods: 


(a) Replacing T,,(x) by its Rodrigues representation. 
(b) Using x =cos@ to transform the integral to a form with 6 as the variable. 
ANS. I = m! (m—n-—1)!! a 
. mn > Gee Gaon) > man, m+n even. 
Establish the following bounds, —1 <x <1: 





<n’. 


d 
(a) |Un(x)|<n+1, (b) Pate 





(a) Show that for-—l<x<1, |V,(x)| <1. 
(b) Show that W,, (x) is unbounded in —1 <x <1. 





910 Chapter 18 More Special Functions 


18.4.19 


18.4.20 


18.4.21 


18.4.22 


18.4.23 


18.4.24 


Verify the orthogonality-normalization integrals for 


(a) Tn(x), Tn), (b) Vin(x), Vn (x), 
(c) Um(x), Un(x), (d) W(x), W(x). 


Hint. All these can be converted to trigonometric integrals. 


Show whether 


(a) T(x) and V,,(x) are or are not orthogonal over the interval [—1, 1] with respect 
to the weighting factor (1 — x7)~!/?. 

(b) U(x) and W,,(x) are or are not orthogonal over the interval [—1, 1] with respect 
to the weighting factor (1 — x7)!/?. 


Derive 


(a) Trp1(x) + Tri (x) = 2xTn(X), 
(b)  Tin4n(®) + Tn—n(*) = 2T in (x) Ty (x), from the “corresponding” cosine identities. 


A number of equations relate the two types of Chebyshev polynomials. As examples 
show that 


Ty (*) = Un (x) — xUn—1 (x) 
and 

(1 = x7)Un(%) = xTn gi (®) — Thy2(x). 
Show that 


dVn (x) is Tn (x) 
dx a Se ce 





(a) using the trigonometric forms of V, and T,, 
(b) using the Rodrigues representation. 


Starting with x = cos@ and T,, (cos @) = cosné@, expand 


id —i 
jie (=) 
2 
pe. th k k 
x= >k=1 Ee + ({) tae + (3) ti +]. 


the series in brackets terminating after the term containing 7) or To. 


k 


and show that 
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18.4.25 Develop the following Chebyshev expansions (for [—1, 1]): 


2 CO 
@ d-x)i%=— : — 25 (4s? - brine 


s=1 


4 ioe) 
(b) =- YO(H D8 2s +1)! Tas 41). 


s=0 


+1, O<x<l 
—1,-l<x <0 


18.4.26 (a) For the interval [—1, 1] show that 





_1.< s¢1 (25 — 3)! 
age >. 1 pas + D Pas) 





Dae 1 
= 1)! T. . 
a, ra) 


(b) Show that the ratio of the coefficient of 7); (x) to that of P2;(x) approaches (as)~! 
as s — oo. This illustrates the relatively rapid convergence of the Chebyshev 
series. 


Hint. With the Legendre recurrence relations, rewrite x P,(x) as a linear combination 
of derivatives. The trigonometric substitution x = cos 0, T, (x) =cosné is most helpful 
for the Chebyshev part. 


18.4.27 Show that 


m2 = 
—==142) 4s? =15-2. 
§ + a ) 


Hint. Apply Parseval’s identity (or the completeness relation) to the results of 
Exercise 18.4.26. 


18.4.28 Show that 


(a) cosbx= 





1 
2 
4 love) 
a aa 
(b) sin t= —) ea 


18.5 HYPERGEOMETRIC FUNCTIONS 


In Chapter 7 the hypergeometric equation® 
x(1—x)y"(x) + [e — (a+ b+ 1)x]y'(x) — ab y(x) =0 (18.120) 


8This is sometimes called Gauss’ ODE. The solutions are then referred to as Gauss functions. 
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was introduced as a canonical form of a linear second-order ODE with regular singularities 
at x = 0, 1, and oo. One solution, designated 2 F), is 


y(x) =2F i (a, b; c; x) 


abx  a(at+1)b(b+1) x? 
=1+ C ut cet D ay , ¢#0,-1,-2,-3,..., 

which is known as the hypergeometric function or hypergeometric series. For real a, b, 
and c (the only case considered here), the range of convergence forc >a+bis—l<x< 
1, while fora +b—1<c<a+b the convergence range is —1 <x <1. Forc<a+b—-1 
the hypergeometric series diverges. 

The terms of the hypergeometric series are conveniently written in terms of the 
Pochhammer symbol, introduced at Eq. (1.72); we repeat the definition here: 





(@)n=alat+ I(a+2)---a@tn—-I1), @o=l. 


Using this notation, the hypergeometric function becomes 


ey  @nO)n x" 
aD amet (18.121) 


In this form the significance of the subscripts 2 and 1 becomes clear. The leading sub- 
script 2 indicates that two Pochhammer symbols appear in the numerator and the trail- 
ing subscript 1 indicates one Pochhammer symbol in the denominator. The subscripts 2 
and | are only useful if one intends to discuss analogs of the “standard” hypergeometric 
function that involve different numbers of Pochhammer symbols. We retain the subscripts 
because we will shortly identify confluent hypergeometric functions with forms similar 
to Eq. (18.121) but with only one Pochhammer symbol in the numerator, therefore of the 
form , F(a; c; z). Note also that the numerator and denominator parameters are set off by 
semicolons (actually making the subscripts unnecessary). We retain them to conform to 
the most widely used notations for these functions. 

Looking further at Eq. (18.121), we note that the series will reduce to zero (for all x) 
if c is either zero or a negative integer (unless the denominator is fortuitously cancelled 
by a particular choice of a or b). On the other hand, if a or b equals 0 or a negative inte- 
ger, the series terminates and the hypergeometric function becomes a polynomial. Many 
more or less elementary functions can be represented by the hypergeometric function.’ For 
example, 


In(1 +x) =x 2F, (1, 1; 2; —x). (18.122) 


The hypergeometric equation as a second-order linear ODE has a second independent 
solution. The usual form is 


y(x) = x!-¢ 2Fi(a+1l—c,b+1-—c;2-—c;x), c#2,3,4,.... (18.123) 


If c is an integer either the two solutions coincide or (barring a rescue by integral a or 
integral b) one of the solutions will blow up (see Exercise 18.5.1). In such a case the 
second solution is expected to include a logarithmic term. 


With three parameters, a, b, and c, we can represent almost anything. 





18.5 Hypergeometric Functions 913 


Alternate forms of the hypergeometric ODE include 


d* [/1-z d[(1-z 
a ( : )>| [@4+b4+Iz-(@+b+1 20) = | 5 )>| 


1-z 
—ab ( 5 )»|=o. (18.124) 


d2 
dz 











(1-2) 





(27) — Ca iopeiees = 2 (27) —4ab y(z*) =0. (18.125) 
z |dz el eal 


Contiguous Function Relations 


The parameters a,b, and c enter in the same way as the parameter n of Bessel, Legen- 
dre, and other special functions. As we found with these functions, we expect recurrence 
relations involving unit changes in the parameters a, b, and c. Hypergeometric functions 
that differ by +1 in a parameter are referred to as contiguous functions. Generalizing 
this term to include simultaneous unit changes in more than one parameter, we find 26 
functions contiguous to 2F}(a, b; c; x). Taking them two at a time, we can develop the 
formidable total of 325 equations among the contiguous functions. Two typical examples 
are 





(a—b){ca+b-1)+1-a?-8 +[@—bP- 10-9] 2F\(a, b; c; x) 


=(c—a)(a—b+1)b 2F (a—1,b4+13c; x) 


+(c—b)(a—b—l)a 2F\(a+1,b—1;¢;%), (18.126) 
[2a —c+ (b—a)x]2F\ (a, b3c;x) =a —x)2F (a+ 1,b;3¢; x) 


—(c—a)2F\(a—1,b3¢; x). (18.127) 


Many more contiguous relations can be found in AMS-55 or in Olver et al. (Additional 
Readings). 


Hypergeometric Representations 


A number of the special functions introduced in this book can be expressed in terms of 
hypergeometric functions. The identification can usually be made by noting that these 
functions are solutions of ODEs that are special cases of the hypergeometric ODE. It is 
also necessary to determine the factors needed to express the functions at the agreed-upon 
scale. We cite several examples. 


1. The ultraspherical functions C(x) satisfy the ODE given as Eq. (18.99), and since 
that equation is a special case of the hypergeometric equation, Eq. (18.120), we see that 
ultraspherical functions (and Legendre and Chebyshev functions) may be expressed as 
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hypergeometric functions. For the ultraspherical function we obtain 


(n + 2a)! 
2¢n! T (a+ 1) 





C(x) = 2 nn+2a+1:1+a; 





(18.128) 


with the factor preceding the 2 F; function determined by requiring C (a) to have the 


proper scale. 
2. For Legendre and associated Legendre functions we find 





1l-—x 
Py(x) = 2Fi{ —n,n +1; 1; 5S 


(n +m)! (1 —x2y"/2 


(n—m)! 2m! 





Pr’ (x) = 


Alternate forms for the Legendre functions are 











P(x) = (CD pe 2Fi( nin 5) 1°) 
=(-1)" — 2Fi( n,n+ > 5”), 
Pry ti(x) = (-1)" am x 2Fi( n,n+ 3 53?) 
= 9 EM vahi(-nnt 553°). 


3. The Chebyshev functions have representations 


T(t) = oF _lo1l-x 
nA) = 201) Ms 5s 2 ’ 


3 1-x 
Un(x) = (n + 1) 2Fi(—n.n +2; 3 *), 





2° <2, 


/, 2 3 1- 
Vi(x) =n 13 oF (—n+ Ln I =) =), 


(18.129) 


[= 
2Fi(m — nm +n lim +1; >), (18.130) 


(18.131) 


(18.132) 


(18.133) 


(18.134) 


(18.135) 


The leading factors are determined by direct comparison of complete power series, 
comparison of coefficients of particular powers of the variable, or evaluation at 


x=0Oorl. 


The hypergeometric series may be used to define functions with nonintegral indices. The 


physical applications are minimal. 





Exercises 


18.5.1 


18.5.2 


18.5.3 


18.5.4 


18.5.5 


18.5.6 


18.5.7 
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(a) Forc, an integer, and a and bd nonintegral, show that 
oFi(a,b;c;x) and x!“ F\(a+1—c,b+1—c;2-—c;x) 


yield only one solution to the hypergeometric equation. 
(b) What happens if a is an integer, say, a= —1, and c = —2? 


Find the Legendre, Chebyshev I, and Chebyshev II recurrence relations corresponding 
to the hypergeometric contiguous function relation given as Eq. (18.126). 


Transform the following polynomials into hypergeometric functions of argument x7: 


(a) Ton (x); 
(b) x! Ton41 (x); 
(c) Unn(x); 
(d) x Uon41(x). 


ANS. (a) Ton(x) = (—1)" 2 Fi (—n, n; 5; x”). 
(b) x7! Tong (x) = (—1)"(2n + 1) 2Fi (=n, nt 1; 3; x7). 
(c) Van(x) = (-1)" oF (n,n + 1; 53 x°). 
(d) x~'Uon41(@) = (—1)" (2n + 2) oF (—n,n + 2; 3; x”). 


Derive or verify the leading factor in the hypergeometric representations of the 
Chebyshev functions. 


Verify that the Legendre function of the second kind, Q,,(z), is given by 


oe mi/2y! F Va eee, =) 
viz ~ Pet HQ! 2 2° 9 9 nee a: 


where |z| > 1, |argz| <a, and v 4 —1, —2, —3,---. 








The incomplete beta function was defined in Eq. (13.78) as 


& 
Br(p.q)= f 1?! =H" de. 
0 
Show that 


—1 


By(p,q)=p x? 2Fi(p, 1-4; p +1; x). 


Verify the integral representation 
1 
NC) 


F(a, b; c; 2) = ——~_ #1 — 1) = tz) dt. 
y) (a c Zz) T(b)r(c—b) ( ) ( Z) 
0 


What restrictions must be placed on the parameters b and c? 
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18.5.8 


18.5.9 


18.5.10 


18.5.11 


18.5.12 


Note. Although the power series used to establish this integral representation is only 
valid for |z| < 1, the representation is valid for general z, as can be established by 
analytic continuation. For nonintegral a the real axis in the z-plane from | to oo is a cut 
line. 


Hint. The integral is suspiciously like a beta function and can be expanded into a series 
of beta functions. 


ANS. c>b>0. 


Prove that 

T'(c) (ec —a—b) 
T(c-—aT(c—b)’ 
Hint. Here is a chance to use the integral representation in Exercise 18.5.7. 


2F (a,b,c; 1) = c4#0,-1,-2,..., c>atb. 


Prove that 
—x 
9 Fi (a,b; e;x)=(1—xy 7 2Fi(ac~Fi Cc =). 
—x 


Hint. Try an integral representation. 


Note. This relation is useful in developing a Rodrigues representation of T,,(x) (see 
Exercise 18.5.10). 


Derive the Rodrigues representation of T,, (x), 


Ty (x) = 





(-1)"9/4(1 = oy at a ar] 
2"(n— 5)! dx" 
Hint. One possibility is to use the hypergeometric function relation 
2Fi (a,b; ¢32) = (1-2) oF (ae bic; =). 
=< 
with z = (1—x)/2. An alternate approach is to develop a first-order differential equation 
for y = (1 — x*)"~!/*, Repeated differentiation of this equation leads to the Chebyshev 
equation. 


Show that the summation in Eq. (18.43), 











si (—m),(=n)y a" 
l—-m-n 2(a2—1)) ’ 
v=0 yp! 
2 v 

can be written as a hypergeometric function. 
Verify that 

—b 

Henkes i. 
(C)n 


Hint. Here is a chance to use the contiguous function relation Eq. (18.127) and math- 
ematical induction (Section 1.4). Alternatively, use the integral representation and the 
beta function. 
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CONFLUENT HYPERGEOMETRIC FUNCTIONS 


The confluent hypergeometric equation,!° 
xy"(x) + (¢ —x)y' (x) — ay(x) =0, (18.136) 


has a regular singularity at x = 0 and an irregular one at x = oo. It is obtained from the 
hypergeometric equation of Section 18.5 in the limit that one of the singularities at finite x 
is merged with that at infinity, causing that singularity to become irregular. One solution 
of the confluent hypergeometric equation is 


y(x) =1F (a; c; x) = M(a,c, x) 


a Oe MOS 0,-1,—2 (18.137 
=O? ae a On ae 
The notation M (a, c, x) (with commas, not semicolons) has become standard for this solu- 
tion. It is convergent for all finite x (or complex z). In terms of the Pochhammer symbols, 
we have 





[o.¢) 


M(a,c,x) =~ Cas (18.138) 


! 
= (c)n n! 





Clearly, M(a,c,x) becomes a polynomial if the parameter a is 0 or a negative integer. 
Numerous more or less elementary functions may be represented by the confluent hyper- 
geometric function. Examples are the error function and the incomplete gamma function: 





x 
erf(x) = aa f ear — xm(>. ?), (18.139) 
0 
yaa) = f et Ndr =a Ma, a+ 1,-x), Me(a)>0. (18.140) 
0 
A second solution of Eq. (18.136) is given by 
y(x) =x!°M(a+1—c,2—c,x), c#2,3,4,---. (18.141) 


Clearly, this coincides with the first solution for c = 1. 
The standard form of the second solution of Eq. (18.136) is a linear combination of 
Eqs. (18.137) and (18.141): 





U(a,c,x)= 


4 M(a,c,x) x) "Mia =—¢2—¢%) 
sinc 


T(a—c+1)l(c) I'(a)T'(—c) | Cee) 


Note the resemblance to our definition of the Neumann function, Eq. (14.57). As with the 
Neumann function, this definition of U(a,c, x) becomes indeterminate for certain param- 
eter values, namely when c is an integer. 


10This is often called Kummer’s equation. The solutions, then, are Kummer functions. 
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An alternate form of the confluent hypergeometric equation is obtained by changing the 
independent variable from x to x7: 





d? 2c-1 d 
S70) + | 7 2] Fe) 4ay(x*) =0. (18.143) 


As with the hypergeometric functions, contiguous functions exist in which the param- 
eters a and c are changed by +1. Including the cases of simultaneous changes in the two 
parameters, we have eight possibilities. Taking the original function and pairs of the con- 
tiguous functions, we can develop a total of 28 equations. The recurrence relations for 
Bessel, Hermite, and Laguerre functions are special cases of these equations. 





Integral Representations 


It is frequently convenient to have the confluent hypergeometric functions in integral form. 
We find (Exercise 18.6.10) 


T'(c) 





M Sa ee ia a 0, 18.144 
(a,c, x) Foreoal ° (1-1) c>a> ( ) 
0 
1 CO 
U(a,c,x)= T@ ere +t)°-*-"dr, Re(x) >0,a>0. (18.145) 
a 
0 


Three important techniques for deriving or verifying integral representations are as fol- 
lows: 


1. Transformation of generating function expansions and Rodrigues representations: The 
Bessel and Legendre functions provide examples of this approach. 

2. Direct integration to yield a series: This direct technique is useful for a Bessel function 
representation (Exercise 14.1.17) and a hypergeometric integral (Exercise 18.5.7). 

3. (a) Verification that the integral representation satisfies the ODE. (b) Exclusion of 
the other solution. (c) Verification of normalization. This is the method used in Sec- 
tion 14.6 to establish an integral representation of the modified Bessel function K,(z). 
It will work here to establish Eqs. (18.144) and (18.145). 


Confluent Hypergeometric Representations 


Special functions that can be represented in terms of confluent hypergeometric functions 
include the following: 


1. Bessel functions: 
—ix 


1 
0) = (2) m v+—,2v +1, 2ix J, (18.146) 
Pw+1)\2 2 
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whereas for the modified Bessel functions of the first kind, 








Liga (5) fee ee Be (18.147) 
vplx ~TO+D 5) Vv 2? v > 2X |. ‘ 
2. Hermite functions: 
2n)! 1 
Ho, (x) = (-)"! : u( mz), (18.148) 
Nn. 
2(2 1)! 3 
Pai) = Cy" aia “x ( m5), (18.149) 
Nn. 
using Eq. (13.150). 
3. Laguerre functions: 
Ln(x) = M(—n, 1,x). (18.150) 


The constant is fixed as unity by noting Eq. (18.54) for x = 0. For the associated 
Laguerre functions, 


d™ (n+m)! 
i (x)= 1)" oe Ent (x)= > alo 


M(—n,m +1, x). (18.151) 

Alternate verification is obtained by comparing Eq. (18.151) with the power-series solu- 
tion, Eq. (18.59). Note that in the hypergeometric form, as distinct from a Rodrigues rep- 
resentation, the indices n and m need not be integers, but if they are not integers, L”” (x) 
will not be a polynomial. 


Further Observations 


There are certain advantages in expressing our special functions in terms of hypergeo- 
metric and confluent hypergeometric functions. If the general behavior of the latter func- 
tions is known, the behavior of the special functions we have investigated follows as a 
series of special cases. This may be useful in determining asymptotic behavior or evalu- 
ating normalization integrals. The asymptotic behavior of M(a,c,x) and U(a,c, x) may 
be conveniently obtained from integral representations of these functions, Eqs. (18.144) 
and (18.145). The further advantage is that the relations between the special functions are 
clarified. For instance, an examination of Eqs. (18.148), (18.149), and (18.151) suggests 
that the Laguerre and Hermite functions are related. 

The confluent hypergeometric equation, Eq. (18.136), is clearly not self-adjoint. For this 
and other reasons it is convenient to define 


Muy (x) =e? x# 1PM —k + 5,2 + 1,x). (18.152) 


This new function, Mx, (x), is called a Whittaker function; it satisfies the self-adjoint equa- 
tion 


i. &. a= 
M+ ( re ae ) anc =0 (18.153) 
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The corresponding second solution is 
Wey (x) = ex"? (u —k + 5,2 + 1, x). (18.154) 
Exercises 
18.6.1. Verify the confluent hypergeometric representation of the error function 


2x 1 3 
erf(x) = aM (5. 2 -»°), 


18.6.2 | Show that the Fresnel integrals C(x) and s(x) of Exercise 12.6.1 may be expressed in 
terms of the confluent hypergeometric function as 


(; 3 ax) 
C(x) +is(x)=xM|{ -, = . 





272) 2. 
18.6.3. By direct differentiation and substitution verify that 
x 
y=ax “ / et?! dt =ax~“y(a, x) 
0 
satisfies 
xy" +(a+14+x)y’ +ay=0. 
18.6.4 Show that the modified Bessel function of the second kind, K,(x), is given by 
Ky (x) =2'%e~* (2x)"U(v + 5, 2v +1, 24). 


18.6.5 Show that the cosine and sine integrals of Section 13.6 may be expressed in terms of 
confluent hypergeometric functions as 


Ci(x) +i si(x) = —e'* U (1, 1, —ix). 
This relation is useful in numerical computation of Ci(x) and si(x) for large values of x. 


18.6.6 Verify the confluent hypergeometric form of the Hermite polynomial A2n+1(x), 
Eq. (18.149), by showing that 





(a) Hon+1(x)/x satisfies the confluent hypergeometric equation with a = —n, c = 3/2 
and argument x7, 
H.- 2(2n + 1)! 
(b) lim 2n+1 (x) —(-1)" (2n + 1) 
x0 x n! 


18.6.7. | Show that the contiguous confluent hypergeometric function equation 
(c—a)M(a—1,c,x)+ Qa—c+x)M(a,c,x)-aM(a+l,c,x)=0 


leads to the associated Laguerre function recurrence relation, Eq. (18.66). 





18.6.8 


18.6.9 


18.6.10 


18.6.11 


18.6.12 


18.6.13 
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Verify the Kummer transformations: 


(a) M(a,c,x) =e*M(c—a,c, —x), 
(b) U(a,c,x) =x!~“U(a—c+1,2—c,x). 








Prove that 

(a) ia M( ) Daag +n,b+n,x) 
x)= a+n 

Be gg EE By —— 

(b) U(a,c,x) = (-1)"(a),nU (a +n,c+n,x). 





dx" 


Verify the following integral representations: 


Ic) ? ct,a—1 —a-1 
M(a,c, x) = ———_ wea" (1 — to 9 dt, 0, 
(a) (a,c, x) F@re—ass e ( ) c>a> 


1 [o.@) 
(b) U(a,c,x)= al e191. 4 2)°-4 lat, ~=Ne(x) > 0,a>0. 
a 0 


Under what conditions can you accept Jte(x) = 0 in part (b)? 

From the integral representation of M(a, c, x), Exercise 18.6.10(a), show that 
M(a,c,x)=e* M(c—a,c, —x). 

Hint. Replace the variable of integration t by 1 — s to release a factor e* from the 

integral. 


From the integral representation of U(a,c,x) in Exercise 18.6.10(b), show that the 
exponential integral is given by 


E\(x) =e *U(1,1,x). 


Hint. Replace the variable of integration t in E(x) by x(1+s). 


From the integral representations of M(a,c,x) and U(a,c,x) in Exercise 18.6.10, 
develop asymptotic expansions of 


(a) M(a,c,x), (b) U(a,c,x). 
Hint. You can use the technique that was employed with K,(z) in Section 14.6. 
ANS. 














T(c) e (l-—a)(c—a) (l—a)2-a)(c—a)(c—at]l) 

@) Fa x4 {1+ i 2x2 +o} 
1 a(l+a-—c) a(a+I1)i+a-—c)(2+a-c) 

(0) a {! 1! (—x) 2! (—x)2 al | 
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18.6.14 


18.6.15 


18.6.16 


18.6.17 


Show that the Wronskian of the two confluent hypergeometric functions M (a, c, x) and 
U(a,c, x) is given by 


(c — 1)! e* 
(a—1)! xe" 





MU’ —-MU=-— 


What happens if a is 0 or a negative integer? 


The Coulomb wave equation (radial part of the Schrédinger equation with Coulomb 
potential) is 





d*y 2n L(L+1) 
1 =0. 
dr2 = r r2 )» 8 


Show that a regular solution y = F, (7, r) is given by 
Fr(n,r) = Ci(n)r"te-” M(L +1 — in, 2L +2, 2ir). 


(a) Show that the radial part of the hydrogen-atom wave function, Eq. (18.81), may 
be written as 


e 2 (qr) joa (ar) = 


(n+ L)! 
M—L—l\QL+1)! 





e "2 ar)” M(L + _— Nn, aL + 2,ar). 


(b) It was assumed previously that the total (kinetic + potential) energy E of the 
electron was negative. Rewrite the (unnormalized) radial wave function for an 
unbound hydrogenic electron, E > 0. 


ANS. ée!™/?(ar)’M(L + 1 — in, 2L + 2, —iar), outgoing wave. This repre- 
sentation provides a powerful alternative technique for the calculation of 
photoionization and recombination coefficients. 


Evaluate 


dx 
l-a’ 





CO CO d CO 
(a) i [Miu(x)Pdx, (b) i [Min(x)P—, (©) i [Miu (x) 
0 0 x 0 


xX 


where 2 =0,1,2,..., k—-u—5=0,1,2,..., a@>—2y-1. 
ANS. (a) 2k(2)!, (b) (2u)!, (c) (2k)? (2)! 





18.7 


18.7 Dilogarithm 923 
DILOGARITHM 


The dilogarithm, defined as 
z 


. In(1 — ft) 
Lio(z) = — —— = (18.155) 
0 

and its analytic continuation beyond the range of convergence of the above integral, arises 
in the evaluation of matrix elements in few-body problems of atomic physics and in vari- 
ous perturbation-theoretic contributions to quantum electrodynamics. Because of a historic 
lack of familiarity with this special function among physicists, many places of its occur- 
rence have only been recognized in recent years. 


Expansion and Analytic Properties 


Expanding the logarithm in Eq. (18.155), using the series in Eq. (1.97), we directly obtain 
the series expansion 
; a 
Lin) =) 5: (18.156) 
n=1 

Note that we have inserted the logarithm without an additional multiple of 277, thereby 
obtaining the branch of Lig that is nonsingular at z = 0. 

Further applications of the operator that converts — In(1 — z) into Lio(z) produce poly- 
logarithms, which also occur in physics, albeit less frequently: 


Zz 


gh 


di < 

Lip) = f Lipo t= oo p=3, 4,.... (18.157) 
0 n=1 

However, in this text we limit consideration to the first member of this sequence, Liz. 

The series expansion of Lig, Eq. (18.156), has circle of convergence |z| = 1, with con- 
vergence for all z on this circle. The singularity limiting the radius of convergence is not 
apparent from the form of the expansion, but, looking at Eq. (18.155), we identify it as a 
branch point located at z = 1. It is customary to draw a branch cut from z = 1 to z = 00 
along, and just below the positive real axis, and to define the principal value of Lig as that 
which corresponds to Eq. (18.156) and its analytic continuation. 

From the form of Eq. (18.156), it is apparent that for real z in the interval —1 <z< +1, 
Liz(z) will also be real. For z > 1, we see from Eq. (18.155) that for part of the range of 
integration, the factor In(1 — t) will necessarily be complex, with the result that Li2(z) will 
no longer be real, even for real z. However, there is no similar problem for negative real z, 
as the principal value of In(1 — ft) remains real for all negative real values of f. 

Analyzing further the behavior of the integral in Eq. (18.155), we note that if we reach 
a point z by carrying out the integral, along a path (in ft) that goes first from t = 0 to just 
above the branch point at ¢ = 1, and then in a straight line to z, we will for the last segment 
of the path alter the argument of 1 —t by some amount 6 in the clockwise direction, thereby 
adding an amount —i6 to the numerator of the integrand. See Fig. 18.8. 
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FiGURE 18.8 Contours for integral representation of dilogarithm. 


This addition to the numerator means that the evaluation of Lig(z) will have the form 


1 Zz z 


Lip (z) = [-> t) dt [= arvio{ o 


0 1 1 





. fin(i-t) 
= Lip(1) - —— dt +10 Inz (path above z = 1). (18.158) 
1 
If we repeat the above analysis to reach the same point z by a path (in r) that passes around 


z = 1 below the real axis, the argument of | — ¢ will be changed by an amount 27 — 6 in 
the counterclockwise direction, and 


* In(jl — 
ene Peer ay [> Dj GGr= eine Gath belows= 1), “C8165 
1 





Comparing Eqs. (18.158) and (18.159), we see that the values of Liz(z), for the same z, 
but on these two different branches the values, will differ by an amount 277i In z. If z is 
complex, the difference will affect both the real and imaginary parts of Lij(z), in ways 
more complicated than either changing the phase or adding a multiple of z to the imagi- 
nary part. When working with the dilogarithm, it is therefore essential to make a careful 
determination of the branch on which it is to be evaluated. In fact, whenever possible 
formulas involving the dilogarithm and (because of the context) known to be real-valued 
should be manipulated (using formulas such as those in the next subsection) to cause each 
dilogarithm in the formula to be for a value of z that is real and with z < 1. 


Properties and Special Values 


From Eq. (18.156), we see that Li2(0) = 0. Setting z = 1, we note that we get the series 
for €(2), so Lix(1) = ¢(2) = 27/6. We also have Li2(—1) = —n(2), where (2) is the 
Dirichlet series in Eq. (12.62), so Lig(—1) = —m?/12. 
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The dilogarithm has a derivative that follows directly from Eq. (18.155), 


dLin(z) In —2z) 
dz z 





(18.160) 


and possesses several functional relations enabling an easy analytic continuation beyond 
the convergence range of Eq. (18.156). Some of these are the following: 





2 
Lio(z) + Lix(1 — 2) = 7 —Inzin(1 — z) (18.161) 
: : a4 ae 1 2 
Lig(z) + Lin(z) = ares In*(—z) (18.162) 
; Zz 1.5 
Lin(z) + Lig = In*(1 — z). (18.163) 
z—-l1 2 


These relationships are most easily established by showing that the derivatives of both 
sides of the equations are equal and that the values of the two sides correspond for some 
convenient value of z. These functional relations enable the determination of Li2(z) for all 
real z from values on the real line in the range |z| < Bs for which the series in Eq. (18.155) 
converges rapidly. 

From the functional relations it is possible to identify a few more specific values of z 
for which the principal value of Liz(z) can be expressed in terms of elementary functions. 
For example, Li2(1/2) = -5 In?(2) + 2? /12. But for most z, closed expressions are not 
available. 


Example 18. rf 1 CHECK USEFULNESS OF FORMULA 


The integral 


s og 6 Bra-yri2 
ae d°r,\d°r2 —,———_ 
~ 8x rerery. 


arises in computations of the electronic structure of the He atom. Here rj are the positions 





of two electrons relative to the nucleus (which is at the origin of our coordinate system), 
the integration is over the full three-dimensional spaces of rj and r2, rj = |r;|, and rj2 = 
This integral is found to have the value 
1 fe 
i= 2 +1n(* 2) +i 
yl 6 a+y 
We note from the definition of J that it will be convergent only ifa+ 6,a+y,and B+y 
are all positive. If that is not the case, in the portion of the space in which some particle 


lr) —r2|. 
- 1 
(say) +a Gary) 
Bry} 2 Bry 
We now ask: Are its individual terms real? 
is far from the other two, the overall exponential will increase without limit. Looking now 
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at the formula for the integral, we see immediately that the In? term will be real, as its 
argument is the quotient of two positive numbers. The first Liz term can be written 


Lin (2=*) tin (1- <4), 
a+y a+y 


showing that the argument of Liz is real and less than +1, meaning that this Liz will 
evaluate to a real result. Similar observations apply to the second instance of Liz. We 
conclude that our formula is in a proper form for unambiguous computation using principal 
values of its multivalued functions. 








Exercises 
18.7.1 Prove that the expansion of Li(z), Eq. (18.156), converges everywhere on the circle 
|z}=1. 
18.7.2 Use the functional relations, Eqs. (18.161) to (18.163), to find the principal value of 
Liz(1/2). 
18.7.3. Find all the multiple values of Liz(1/2). 
18.7.4 Explain why Eq. (18.161) gives the expected result for z = 0 when on the principal 
branch of the dilogarithm. 
18.7.5 Show that 
l+z! 1 1 1-z! 
ie in? (—+— J. 
2 1-z 2 2 
18.7.6 The following integral arises in the computation of the electronic energy of the Li atom 


using a correlated wave function (one that explicitly includes the electron-electron dis- 
tances as well as the distances of electrons from the nucleus): 


3 3 3 e) M11 —&272— 0373 
ra fff Prd dry, — 
1112731213123 
where rj = |rj|, rij = |r; —rj|, and the integrations are over the entire three-dimensional 
space of each r;. For convergence of J, we require all a; > 0, but there are no restric- 


tions on their relative magnitudes. 
In terms of the auxiliary quantities 


ay a2 a3 


= ’ = , B= ’ 
a2 + a3 a, +a3 a; +a2 


1 











this integral has the value 








= 111203 2th 12(fj) — 12(—¢j) + néj n 14g; 
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Rearrange / to a form (first found by Remiddi!'), in which all terms in the final expres- 
sion are guaranteed to evaluate to real quantities and can be evaluated as principal val- 
ues. 


18.8 ELLIPTIC INTEGRALS 


Elliptic integrals occasionally arise in physical problems and therefore it is worthwhile to 
summarize their definitions and properties. Before the advent of computers, it was also 
important for physicists and engineers to be familiar with methods for hand computation 
of elliptic integrals, but that need has diminished with time and expansion methods for 
these functions will not be emphasized here. We do, however, illustrate problems in which 
elliptic integrals arise; the following example is a case in point. 


Example 18.8.1 PERIOD OF A SIMPLE PENDULUM 


For small-amplitude oscillations, a pendulum (Fig. 18.9) has simple harmonic motion with 
a period T = 27 (1/ g)l/ 2 But for a maximum amplitude 6 large enough that sin 6, can- 
not be approximated by Oy, a direct application of Newton’s second law of motion and 
solution of the resulting ODE becomes difficult. In that situation a good way to proceed 
is to write the equation for conservation of energy. Setting the zero of potential energy at 
the point from which the pendulum is suspended, the potential energy of a pendulum of 
mass m and length / at angle 6 is —mglcos@, and its total energy (the potential energy 
at angle 0) is —mgl cos@y. The pendulum has kinetic energy ml*(d0/dt)?/2, so energy 
conservation requires 





1 (do * 
aint oF — mglcos 0 = —mgl cos 60y. (18.164) 
Solving for d@/dt we obtain 
do agi? 
a7 +(#) (cos 6 — cos 6y)!/?, (18.165) 





FIGURE 18.9 Simple pendulum. 


LE. Remiddi, Analytic value of the atomic three-electron correlation integral with Slater wave functions. Phys. Rev. A 44: 
5492 (1991). 
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with the mass m canceling out. At t = 0 we choose as initial conditions 6 = 0 and d6/dt > 
0. An integration from 6 = 0 to 6 = Oy yields 


Ou 2¢\ 1/2 : 2¢\ 1/2 
[cose —cos6y)~!/7do = (*) fa = (*) t. (18.166) 
0 0 


This is 5 of a cycle, and therefore the time ¢ is j of the period T. We note that 0 < Oy, 
and with a bit of clairvoyance we try the half-angle substitution 


af sin Ve (18.167) 
sin 5) = s1n 5) sing. 7 


With this, Eq. (18.166) becomes 


1/2 7? -1/2 
r=a(2 jain \ eee 
= sin 5 sin“ ~ dg. (18.168) 
&§ 
0 


The integral in Eq. (18.168) does not reduce to an elementary function; in fact, it is an ellip- 
tic integral of a standard type. Further examples of elliptic integrals in physical problems 
can be found in the exercises. a 


Definitions 


The elliptic integral of the first kind is defined as 


Q 
F(g\a) = f (= sin? asin? yao, (18.169) 
0 
or 
x “1p 
Foxim)= f [aya —m2y] dt, O<m<1. (18.170) 


0 


This is the notation of AMS-55 (Additional Readings). Note the use of the separators \ and 
| to identify the specific functional forms. When the upper limit in these integrals is set to 
y = /2 or x = 1, we have the complete elliptic integral of the first kind, 


m/2 
K(m) = i (1 — msin? 0)~!/2d@ 
0 


(18.171) 
1 
= / [a —PA- mt?)] me 
0 


with m = sin’ a,0<m <1. 
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The elliptic integral of the second kind is defined by 


g 
E(g\a) = | (1 sin? asin? 9)'a9 (18.172) 
0 
or 
- A=me\'? 
0 


Again, for the case g = 1/2, x = 1, we have the complete elliptic integral of the second 
kind: 
m/2 
E(m) = } (1 —msin? 9)'/7do 


1—mt?\'? 
={(4 ~) dt, O<m<1l. 
0 


Series Expansions 


(18.174) 





For our range 0 < m < 1, the denominator of K(m) may be expanded by the binomial 
series in Eq. (1.74): 


CO 
2n —1)!! 
(1—msin?)~"/? = > GaN pane. 
2s” Qn)! 


after which the resulting series is then integrated term by term. The integrals of the indi- 
vidual terms are beta functions (see Exercise 13.3.8), and we get 





Crs x 
kom =F {14 [em Onl ] m | (18.175) 
Similarly (see Exercise 18.8.2), 
ae °F (2n—1)!!77 m" 
em =3 | oa wo. (18.176) 


These series can be identified as hypergeometric functions. Comparing with the general 
definitions in Section 18.5, we have 
ae 1 1.y. 
K(m) = 2 2Fi (5, 3? 1; m), 
3 (18.177) 
E(m) = 5 2Fi(—3, 35 im). 


The complete elliptic integrals are plotted in Fig. 18.10. 
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FIGURE 18.10 Complete elliptic integrals, K (m) and E(m). 


Limiting Values 

From the series Eqs. (18.175) and (18.176), or from the defining integrals, 
lim K(m)=~, lim E(m) =~ 18.178 
yen - = 3 ie a = ( , ) 


For m — | the series expansions are of little use. However, the integrals yield 


lim K(m)=oo, lim E(m) = 1. (18.179) 
m—>1 


m—> | 


The divergence in K (m) is logarithmic. 
Elliptic integrals have been used extensively in the past for evaluating integrals. For 
instance, general integrals of the form 


P 
l= i R (1, Vaat* + a3t3 + ant? +.ayt! + ao) dt, 
0 





where R is a rational function of its arguments, may be expressed in terms of elliptic 
integrals. Jahnke and Emde (Additional Readings) give pages of such transformations. 
With computers available for direct numerical evaluation, interest in these elliptic integral 
techniques has declined. A more extensive account of elliptic functions, integrals, and 
the related Jacobi theta functions can be found in Whittaker and Watson’s treatise. Many 
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formulas and tables of elliptic integrals are in AMS-55 and even more formulas are in 
Olver et al. (all of these sources are in the Additional Readings). 


Exercises 


18.8.1 


18.8.2 


18.8.3 


18.8.4 


The ellipse x7/a? + y*/b* = 1 may be represented parametrically by x = asin@, y = 
bcos@. Show that the length of arc within the first quadrant is 


m/2 
a [ (—msin?6)d0 = aE mn) 
0 


Here 0 < m = (a — b*)/a* <1. 


Derive the series expansion 
1 1\*m (1-3? m? 
E(m)=—}1 ee 
2 2 1 2-4 3 


..(K=£) _@ 
lim ——— = —. 
m—>0O m 4 





Show that 


A circular loop of wire in the xy-plane, as shown in Fig. 18.11, carries a current /. 
Given that the vector potential is 


a 
apol cosa da 
Qn (a2 + p* +22 — 2ap cosa)!/2’ 
0 





Ag(p, Q, z) = 











FiGuRE 18.11 Circular wire loop. 
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show that 
1/2 2 
Lol (a k 2 7) 
A ay pee i3— Ki) - fe 
where 
(se 
(a+ py 427" 


Note. For extension of this exercise to B, see Smythe. !* 
18.8.5 An analysis of the magnetic vector potential of a circular current loop leads to the 
expression 


fk) =k | 2- RK) - 2], 


where K (k*) and E(k?) are the complete elliptic integrals of the first and second kinds. 
Show that for k? « 1 (r >> radius of loop) 


Aa 
ae 


18.8.6 Show that 


dE(k?) 1 
(a) ae = Pg —K), 

dK (k2 E K 
(b) (k°) 





dk k(l1—k2)— ok 


Hint. For part (b) show that 
m/2 
E(k?) =(1—k’) 1 (1 —ksin? 6)~7/*d6 
0 


by comparing series expansions. 


Additional Readings 


Abramowitz, M., and I. A. Stegun, eds., Handbook of Mathematical Functions, Applied Mathematics Series-55 
(AMS-55). Washington, DC: National Bureau of Standards (1964), paperback edition, Dover (1974). Chapter 
22 is a detailed summary of the properties and representations of orthogonal polynomials. Other chapters 
summarize properties of Bessel, Legendre, hypergeometric, and confluent hypergeometric functions and much 
more. See also Olver et al., below. 


Buchholz, H., The Confluent Hypergeometric Function. New York: Springer Verlag (1953), translated (1969). 
Buchholz strongly emphasizes the Whittaker rather than the Kummer forms. Applications to a variety of other 
transcendental functions. 


wer. Smythe, Static and Dynamic Electricity, 3rd ed. New York: McGraw-Hill (1969), p. 270. 
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Erdelyi, A., W. Magnus, F. Oberhettinger, and F. G. Tricomi, Higher Transcendental Functions, 3 vols. New 
York: McGraw-Hill (1953), reprinted, Krieger (1981). A detailed, almost exhaustive listing of the properties 
of the special functions of mathematical physics. 

Fox, L., and I. B. Parker, Chebyshev Polynomials in Numerical Analysis. Oxford: Oxford University Press (1968). 
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CHAPTER 19 


FOURIER SERIES 


Periodic phenomena involving waves, rotating machines (harmonic motion), or other 
repetitive driving forces are described by periodic functions. Fourier series are a basic 
tool for solving ordinary differential equations (ODEs) and partial differential equations 
(PDEs) with periodic boundary conditions. Fourier integrals for nonperiodic phenomena 
are developed in Chapter 20. The common name for the field is Fourier analysis. 


19.1 GENERAL PROPERTIES 


A Fourier series is defined as an expansion of a function or representation of a function in 
a series of sines and cosines, such as 


CO CO 
f@)= 7 tLe ae sinnx. (19.1) 
The coefficients ap, a,, and by are related to f(x) by definite integrals: 
20 
an == fF) cosns as, n=0,1,2,..., (19.2) 
7 0 
20 
b= = f Foo)sinns ds, n=1,2,..., (19.3) 
0 


which are subject to the requirement that the integrals exist. Note that ag is singled out for 
special treatment by the inclusion of the factor 5. This is done so that Eq. (19.2) will apply 
to all a,,n =0 as wellasn > 0. 

The conditions imposed on f(x) to make Eq. (19.1) valid are that f(x) have only a 
finite number of finite discontinuities and only a finite number of extreme values (max- 
ima and minima) in the interval [0,2z].! Functions satisfying these conditions may be 


' These conditions are sufficient but not necessary. 
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called piecewise regular. The conditions themselves are known as the Dirichlet condi- 
tions. Although there are some functions that do not obey these conditions, they can be 
considered pathological for purposes of Fourier expansions. In the vast majority of physi- 
cal problems involving a Fourier series, the Dirichlet conditions will be satisfied. 
Expressing cosnx and sinnx in exponential form, we may rewrite Eq. (19.1) as 


foo) 
f(@) = >. ae, (19.4) 
n=—0o 
in which 
1 ; 1 ; 
Ch = 3 (Gn — ibn), Cnr 3 Gn + ibn), n>0, (19.5) 
and 
1 
co = 4a, (19.6) 


Sturm-Liouville Theory 


The ODE 
—y"(x) = Ay(x) 


on the interval [0,27] with boundary conditions y(0) = y(27), y’(0) = y'(2z) is a 
Sturm-Liouville problem, and these boundary conditions make it Hermitian. Therefore 
its eigenfunctions, either cosnx (n = 0, 1,...) and sinnx (n = 1, 2,...), or exp(inx) 
(n=...,—1, 0, 1,...), form a complete set, with eigenfunctions of different eigenval- 
ues orthogonal. Since the eigenfunctions have respective values n7, those of different |n| 
will automatically be orthogonal, while those of the same |n| can be orthogonalized if 
necessary. Defining the scalar product for this problem as 


20 
(flg) = / f* (x)g(x) dx, 
0 


it is easy to check that (e!”*|e7!"*) = 0 for n £0, and if we write cosnx and sinnx 
as complex exponentials, it is also easy to see that (sinnx| cosnx) = 0. To make the 
eigenfunctions normalized, a simple approach is to note that the average value of sin? nx 


or cos* nx over an integer number of oscillations is 1/2 (again for n 4 0), so 


20 20 
[smnsdx= f costnedx =n (n 40), 
0 0 


and (e"*| e!"*) — 27, 
The relationships identified above indicate that the eigenfunctions g, = e!”* /./27, (n= 
...,—1, 0, 1,...) form an orthonormal set, as do 


1 cOSnx sinnx 


==: Pn = > P > 
Jin "tt a ahi 
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so expansions in these functions have the forms given in Eqs. (19.1) to (19.3) or Eqs. (19.4) 
to (19.6). Since we know that the eigenfunctions of a Sturm-Liouville operator form a 
complete set, we know that our Fourier-series expansions of L* functions will at least 
converge in the mean. 


Discontinuous Functions 


There are significant differences between the behavior of Fourier- and power-series 
expansions. A power series is essentially an expansion about a point, using only infor- 
mation from that point about the function to be expanded (including, of course, the values 
of its derivatives). We already know that such expansions only converge within a radius of 
convergence defined by the position of the nearest singularity. However, a Fourier series 
(or any expansion in orthogonal functions) uses information from the entire expansion 
interval, and therefore can describe functions that have “nonpathological” singularities 
within that interval. However, we also know that the representation of a function by an 
orthogonal expansion is only guaranteed to converge in the mean. This feature comes into 
play for the expansion of functions with discontinuities, where there is no unique value to 
which the expansion must converge. However, for Fourier series, it can be shown that if a 
function f(x) satisfying the Dirichlet conditions is discontinuous at a point xo, its Fourier 
series evaluated at that point will be the arithmetic average of the limits of the left and right 
approaches: 





Feot 6) + Feo — =| (19.7) 


2 


For proof of Eq. (19.7), see Jeffreys and Jeffreys or Carslaw (Additional Readings). It 
can also be shown that if the function to be expanded is continuous but has a finite dis- 
continuity in its first derivative, its Fourier series will then exhibit uniform convergence 
(see Churchill, Additional Readings). These features make Fourier expansions useful for 
functions with a variety of types of discontinuities. 


JS Fourier series(X0) = lim 
e>0 


Example 19.1.1. = SawTtooTH WAVE 


An idea of the convergence of a Fourier series and the error in using only a finite number 
of terms in the series may be obtained by considering the expansion of 


x, O<x <zZ, 


fx) = | (19.8) 


x-—2n, W<x<2n. 


This is a sawtooth wave form, as shown in Fig. 19.1. Using Eqs. (19.2) and (19.3), we find 
the expansion to be 





sin2x  sin3x yntl sinnx +] (19.9) 


f(x) =2] sina a 


Figure 19.2 shows f(x) for 0 < x < 2z for the sum of 4, 6, and 10 terms of the series. 
Three features deserve comment. 
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FIGURE 19.1 Sawtooth wave form. 


10 terms 





FiGURE 19.2 Expansion of sawtooth wave form, range [0, 27r]. 


1. There is a steady increase in the accuracy of the representation as the number of terms 
included is increased. 

2. Atx =a, where f(x) changes discontinuously from +z to —z, all the curves pass 
through the average of these two values, namely f(z) = 0. 

3. In the vicinity of the discontinuity at x = 7, there is an overshoot that persists and 
shows no sign of diminishing. 


As a matter of incidental interest, setting x = 2/2 in Eq. (19.9) leads to 


~)===2/1-0 te Shes 
ia)= a aS 7 
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thereby yielding an alternate derivation of Leibniz’s formula for 2/4, which was obtained 
by another method in Exercise 1.3.2. | 


Periodic Functions 


Fourier series are used extensively to represent periodic functions, especially wave forms 
for signal processing. The form of the series is inherently periodic; the expansions in 
Eqs. (19.1) and (19.4) are periodic with period 27, with sinnx, cosnx, and exp(inx), 
each completing n cycles of oscillation in that interval. Thus, while the coefficients in a 
Fourier expansion are determined from an interval of length 27, the expansion itself (if 
the function involved is actually periodic) applies for an indefinite range of x. The period- 
icity also means that the interval used for determining the coefficients need not be [0, 277] 
but may be any other interval of that length. Often one encounters situations in which the 
formulas in Eqs. (19.2) and (19.3) are changed so that their integrations run between —z 
and zr. In fact, it would have been natural to have restated Example 19.1.1 as dealing with 
f(x) =x, for —a <x <a. This of course does not remove the discontinuity or change 
the form of the Fourier series. The discontinuity has simply been moved to the ends of the 
interval in x. 

In actual situations, the natural interval for a Fourier expansion will be the wavelength 
of our wave form, so it may make sense to redefine our Fourier series so that Eq. (19.1) 


becomes 
a oe NITX ~ NITX 
fx)= > + Plan cos +) bn sin, (19.10) 
n=1 n=1 
with 
L 
=| to) db, R201 19.11 
ons 5) COS L Sy A= gig Agee ss ( 7 ) 
—L 
1 L 
m= f foysin = as, a (19.12) 
—L 


In many problems the x dependence of a Fourier expansion describes the spatial depen- 
dence of a wave distribution that is moving (say, toward +x) with phase velocity v. This 
means that in place of x we need to write x — vt, and this substitution carries the implicit 
assumption that the wave form retains the same shape as it moves forward.” The individual 
terms of the Fourier expansion can now be given an interesting interpretation. Taking as 
an example the term 


cos [= (x — vt)], 





2For waves in physical media, this assumption is by no means always true, as it depends on the time-dependent response 
properties of the medium. 
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we note that it describes a contribution of wavelength 2L./n (when x increases this much 
at constant f, the argument of the cosine function increases by 277). We also note that 
the period of the oscillation (the change in t at constant x for one cycle of the cosine 
function) is T = 2L/nv, corresponding to the oscillation frequency v = nv/2L. If we call 
the frequency for n = 1 the fundamental frequency and denote it vp = v/2L, we identify 
the terms for each n > | in the Fourier series as describing overtones, or harmonics of the 
fundamental frequency, with individual frequencies nvo. 

A typical problem for which Fourier analysis is suitable is one in which a particle under- 
going oscillatory motion is subject to a periodic driving force. If the problem is described 
by a linear ODE, we may make a Fourier expansion of the driving force and solve for each 
harmonic individually. This makes the Fourier expansion a practical tool as well as a nice 
analytical device. We stress, however, that its utility depends crucially on the linearity of 
our problem; in nonlinear problems an overall solution is not a superposition of component 
solutions. 

As suggested earlier, we have proceeded on the assumption that v, the phase velocity, is 
the same for all terms of the Fourier series. We now see that this assumption corresponds to 
the notion that the medium supporting the wave motion can respond equally well to forces 
at all frequencies. If, for example, the medium consists of particles too massive to respond 
quickly at high frequency, those components of the wave form will become attenuated and 
damped out of a propagating wave. Conversely, if the system contains components that 
resonate at certain frequencies, the response at those frequencies will be enhanced. Fourier 
expansions give physicists (and engineers) a powerful tool for analyzing wave forms and 
for designing media (e.g., circuits) that yield desired behaviors. 

One question that is sometimes raised is: “Were the harmonics there all along, or were 
they created by our Fourier analysis?” One answer compares the functional resolution into 
harmonics with the resolution of a vector into rectangular components. The components 
may have been present, in the sense that they may be isolated and observed, but the reso- 
lution is certainly not unique. Hence many authors prefer to say that the harmonics were 
created by our choice of expansion. Other expansions in other sets of orthogonal functions 
would produce a different decomposition. For further discussion, we refer to a series of 
notes and letters in the American Journal of Physics.° 

What if a function is not periodic? We can still obtain its Fourier expansion, but (a) the 
results will of course depend on how the expansion interval is chosen (both as to posi- 
tion and length), and (b) because no information outside the expansion interval was used 
in obtaining the expansion, we can have no realistic expectation that the expansion will 
produce there a reasonable approximation to our function. 


Symmetry 


Suppose we have a function f(x) that is either an even or an odd function of x. If it is 
even, then its Fourier expansion cannot contain any odd terms (since all terms are linearly 
independent, no odd term can be removed by retaining others). Our expansion, developed 


3B. L. Robinson, Concerning frequencies resulting from distortion. Am. J. Phys. 21: 391 (1953); F. W. Van Name, Jr., Concern- 
ing frequencies resulting from distortion. Am J. Phys. 22: 94 (1954). 
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for the interval [—z, zr], then must take the form 


fwme= SF + Vancosns, f (x) even. (19.13) 


n=1 


On the other hand, if f(x) is odd, we must have 


f(x) =) bysinnx, — f(x) odd. (19.14) 


n=1 


In both cases, when determining the coefficients we only need consider the interval [0, zr], 
referring to Eqs. (19.2) and (19.3), as the adjoining interval of length wz will make a con- 
tribution identical to that considered. The series in Eqs. (19.13) and (19.14) are sometimes 
called Fourier cosine and Fourier sine series. 

If we have a function defined on the interval [0, 7], we can represent it either as a Fourier 
sine series or as a Fourier cosine series (or, if it has no interfering singularities, as a power 
series), with similar results on the interval of definition. However, the results outside that 
interval may differ markedly because these expansions carry different assumptions as to 
symmetry and periodicity. 


Example 19.1.2. DirFereNtT EXPANSIONS OF f (x) =x 


We consider three possible ways to expand f(x) = x based on its values on the range 
[0, zr]: 


e Its power-series expansion will (obviously) have the power-series expansion f(x) =x. 


e Comparing with Example 19.1.1, its Fourier sine series will have the form given in 





Eq. (19.9). 
e Its Fourier cosine series will have coefficients determined from 
x, n=O, 
is 
2 / 4 
an = — | xcosnxdx = {~———, n=1, 3, 5.,..., 
a n2n 
. 0, n=2, 4, 6,..., 
corresponding to the expansion 
CO 
cre 4 cos(2n + 1)x 
FO=5 U5 Qn+b?2” 


All three of these expansions represent f (x) well in the range of definition, [0, 2], but their 
behavior becomes strikingly different outside that range. We compare the three expansions 
for a range larger than [0, zr] in Fig. 19.3. | 
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FiGURE 19.3. Expansions of f(x) =x on [0, zr]: (a) power series, (b) Fourier sine series, 
(c) Fourier cosine series. 


Operations on Fourier Series 


Term-by-term integration of the series 


CO [o.@) 
f@)= S + Doancosnx + Y 7 by sinnx (19.15) 
n=1 n=1 
yields 
= Pe oo ia x oo z x 
[ro dx = +5 —sinnx) — > —cosnx (19.16) 
x0 x0 n=1 ss x0 n=1 . x0 











Clearly, the effect of integration is to place an additional power of n in the denomina- 
tor of each coefficient. This results in more rapid convergence than before. Consequently, 
a convergent Fourier series may always be integrated term by term, the resulting series 
converging uniformly to the integral of the original function. Indeed, term-by-term inte- 
gration may be valid even if the original series, Eq. (19.15), is not itself convergent. The 
function f (x) need only be integrable. A discussion will be found in Jeffreys and Jeffreys 
(Additional Readings). 

Strictly speaking, Eq. (19.16) may not be a Fourier series; that is, if a9 4 0, there will 
be a term 5a0x. However, 


[ feous — 5a0% (19.17) 
x0 


will still be a Fourier series. 
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The situation regarding differentiation is quite different from that of integration. Here 
the word is caution. Consider the series for 





f@)=x, -1<x<TZ. (19.18) 
We readily found (in Example 19.1.1) that the Fourier series is 
= sinnx 
many ie , —h<x<nq, (19.19) 
n 
n=1 


Differentiating term by term, we obtain 


[o,@) 
1=2 )\(-1)"*! cosnx, (19.20) 
n=1 
which is not convergent. Warning: Check your derivative for convergence. 
For the triangular wave shown in Fig. 19.4 (and treated in Exercise 19.2.9), the Fourier 
expansion is 





f(x)=" s ee (19.21) 


2 n 
n=1,odd 


which converges more rapidly than the expansion of Eq. (19.19); in fact, it exhibits uniform 
convergence. Differentiating term by term we get 





ae . sinnx 
f'@=— > —, (19.22) 


n=1,odd 


which is the Fourier expansion of a square wave, 


1, O<x<zZ, 
fi@= (19.23) 


-l, -a<x <0. 


Inspection of Fig. 19.3 verifies that this is indeed the derivative of our triangular wave. 














FiGURE 19.4 Triangular wave. 
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e As the inverse of integration, the operation of differentiation has placed an additional 
factor n in the numerator of each term. This reduces the rate of convergence and may, 
as in the first case mentioned, render the differentiated series divergent. 


e In general, term-by-term differentiation is permissible if the series to be differentiated 
is uniformly convergent. 


Summing Fourier Series 


Often the most efficient way to identify the function represented by a Fourier series is 
simply to identify the expansion in a table. But if it is our desire to sum the series ourselves, 
a useful approach is to replace the trigonometric functions by their complex exponential 
forms, and then identifying the Fourier series as one or more power series in e+!*, 





Example 19.1.3. SUMMATION OFA FOURIER SERIES 





Consider the series peas 6 /n)cosnx, x € (0, 277). Since this series is only conditionally 
convergent (and diverges at x = 0), we take 


= lim Se 
n r>1 n 
n=1 n=1 





lore) CO on 
> COSNX re COSnNX 


absolutely convergent for |r| < 1. Our procedure is to try forming power series by trans- 
forming the trigonometric functions into exponential form: 


CO Un jinx 1 CO Un ,—inx 


= r"cosnx _ 1 r"e rve 
ye n moe n sap n 


n=1 n=1 n=1 





Now, these power series may be identified as Maclaurin expansions of — In(1 — z), with 
z=re'* orre '. From Eq. (1.97), 





CO oun 
r” cosnx 1 ix _ix 
a —— = —5[In(1 — re*) + In(1 — re“"*)] 


n=1 
= —In[(1 +r’) —2rcosx]!/?. 


Setting r = 1, we see that 





fa COSnNX 
= —In(2 —2cosx)!/? 


=~In(2sin>), (0<x <2m). (19.24) 


Both sides of this expression diverge as x > 0 and as x > 2z.* a 


4Note that the range of validity of Eq. (19.24) may be shifted to [—z, 2] (excluding x = 0) if we replace x by |x| on the 
right-hand side. 





Exercises 


19.1.1 


19.1.2 


19.1.3 


19.1.4 
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A function f (x) (quadratically integrable) is to be represented by a finite Fourier series. 
A convenient measure of the accuracy of the series is given by the integrated square of 
the deviation, 


20 Pp 2 
a,=| 8 —Yragcosns sn) dx. 
A = 


Show that the requirement that A, be minimized, that is, 








dAp 0, CAD 
day dby 
for all n, leads to choosing a, and by as given in Eqs. (19.2) and (19.3). 


Note. Your coefficients a, and by, are independent of p. This independence is a conse- 
quence of orthogonality and would not hold if we expanded f(x) in a power series. 


In the analysis of a complex waveform (ocean tides, earthquakes, musical tones, etc.), 
it might be more convenient to have the Fourier series written as 


f@= > + Yan cos(nx — Op). 


n=1 


Show that this is equivalent to Eq. (19.1) with 


2 2 2 
Ayn = Ay COS OH), a, =a, +b, 
bn =A, SinO,, tanO, = by /an. 


Note. The coefficients w? as a function of n define what is called the power spectrum. 
The importance of a2 lies in their invariance under a shift in the phase 6,,. 


A function f(x) is expanded in an exponential Fourier series 


f@= 2 ce, 


n=—Co 
If f(x) is real, f(x) = f*(x), what restriction is imposed on the coefficients c,? 


Assuming that 1 2 [ f(x) |? dx is finite, show that 
lim a,n=0, lim by, =0. 
m—-> CoO m—> Co 


Hint. Integrate [ f (x) — s,(x)|*, where s,(x) is the nth partial sum, and use Bessel’s 
inequality (Section 5.1). For our finite interval the assumption that f(x) is square inte- 
grable ({”_ | f(x)|? dx is finite) implies that {”_ | f (x)| dx is also finite. The converse 
does not hold. 
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19.1.5 


19.1.6 


19.1.7 


19.1.8 


19.1.9 








FIGURE 19.5 Reverse sawtooth wave. 


Apply the summation technique of this section to show that 





n 


ae 5 (1 — x), O<x<7z, 
nal —}(1 +x), —m<x<0. 
This is the reverse sawtooth wave shown in Fig. 19.5. 

Sum the series )7°° | (—1)"*! "* and show that it equals x/2. 


sin(2n+1)x 


Sum the trigonometric series }772.9 54 


and show that it equals 


m/4, O<x<z, 
—m/4, -mw<x<0O. 


Let f(z) =n +z) = y-% tae This series converges to In(1 + z) for |z| < 1, 


n=1 
except at the point z = —1. 


(a) From the real parts show that 


0 = cos nd 
in (20055) = ptt, =n <0 <x. 
n 
n=1 





(b) Using a change of variable, transform part (a) into 


6 cos né 
-In(2sin) => ; 0<6 <2nz. 


n=1 





(a) Expand f(x) =x in the interval (0, 2L). Sketch the series you have found (right- 
hand side of ANS.) over (—2L, 2L). 








19.1.10 


19.1.11 


19.1.12 
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(b) Expand f(x) = x asa sine series in the half interval (0, L). Sketch the series you 
have found (right-hand side of Ans.) over (—2L, 2L). 


[e,e) 


AL 1 n+ 
ANS. x=—)* sin( mt), 


T 2n+1 L 
n=0 





In some problems it is convenient to approximate sinzx over the interval [0, 1] by a 
parabola ax(1 — x), where a is a constant. To get a feeling for the accuracy of this 
approximation, expand 4x (1 — x) in a Fourier sine series (—1 < x < 1): 


CO 
J ) by, Sinn x. 


n=1 


4x(l—x), O<x<1l 
fa)= 


~ | 4x +x), -l<x<0 


LS 


1 
ANS. by = = 0 odd, 
n 
bn _ 


oO 


; n even. 


This approximation is shown in Fig. 19.6. 


Verify that 5(g1 — g2) = t oe e'™(¥1—#2) is a Dirac delta function by showing 
that it satisfies the definition, 
os 1 oo 
[ren Y emoag, = Fee) 
m>=—CO 


<7 
Hint. Represent f(g) by an exponential Fourier series. 
Show that integration of the Fourier expansion of f(x) =x, —ma <x <7, leads to 


=] ae 
n2 ars ie” 








Note. The series for f(x) =x was the subject of Example 19.1.1. Confirm that the 
change in the defined range from [0, 27] to [—z, zr] has no effect on the expansion. 











FIGURE 19.6 Parabolic approximation to sine wave. 
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19.1.13 


19.1.14 


19.1.15 


(a) Assuming that the Fourier expansion of f(x) is uniformly convergent, show that 
us 
=f (re)a 2% + Due+e 
4 2 — 
aie = 


This is Parseval’s identity. Note that it is a completeness relation for the Fourier 
expansion. 





: 5) a = (—1)”" cosnx 
(b) Given x Sats) 5 ; W<x<7, 
n 
n=1 


apply Parseval’s identity to obtain ¢ (4) in closed form. 


(c) The condition of uniform convergence is not necessary. Show this by applying the 
Parseval identity to the square wave 











F(a) -1l, -mw<x<0O 
x)= 
1, O<x<az 
Bie pC at 
1 Qn-1 * 
n=1 
Given 
a9 ( +x) < 0 
ee —--(m7#+x), -mw<x <0, 
aiix)=)>> a ee, 
1. 5 —x), O<x<n, 
show by integrating that 
1 2 x 
(x) pee Ge Te Gas —HExS0, 
x= — 
92 a 2 ; > ; 
= (1 — eee <x<n7. 
4% x) 7 <x<wz 


Given 


CO . Co 
sinnx cosnx 
Wrs(x) = » ee W2s+1(X) = ~~ stl’ 


n=1 n=1 


develop the following recurrence relations: 
x 

(2) vast) = f varaterde, 
0 


(b) Vavsi(s) = 62s +1)— f vase. 
0 
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Note. The functions y,(x) and g,(x) of this and the preceding exercise are known as 
Clausen functions. In theory they may be used to improve the rate of convergence of 
a Fourier series. As is often the case, there is the question of how much analytical work 
we do and how much arithmetic work we demand that a computer do. As computers 
become steadily more powerful, the balance progressively shifts so that we are doing 
less and demanding that computers do more. 


19.1.16 Show that f(x) = °°, EE may be written as 


F(x) = Wie) — g(x) + eee ee 

rape ee n(n + 1)" 
where w(x) and g2(x) are the Clausen functions defined in Exercises 19.1.14 and 
19.1.15. 


19.2 APPLICATIONS OF FOURIER SERIES 


We present in this section two typical problems and a short table of useful Fourier series, 
followed by a substantial number of exercises that illustrate some of the techniques that 
arise in applications. 


Example 19.2.1. SQUARE WAVE 


One application of Fourier series, the analysis of a “square” wave (Fig. 19.7) in terms of its 
Fourier components, occurs in electronic circuits designed to handle sharply rising pulses. 
Suppose that our wave is defined by 


f(x) =0, -2 <x <0, 
19.25 
f@=h, O<x<nz. : 














FIGURE 19.7 Square wave. 
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From Eqs. (19.2) and (19.3), we find 


4 


1 
ag=— | hdt=h 
IU 
0 


1 
ca aun WH 12 .3F ies 
T 


rm 


us 
h 
== f hsinneat = — pm — cosnz) 
0 
2. 


— n odd, 


0, neven. 


The resulting series is 





h 
IQ) 3+ i ot 3 a 5 (19.26) 
Except for the first term, which represents an average of f(x) over the interval [—z, zr], 
all the cosine terms have vanished. Since f (x) — h/2 is odd, we have a Fourier sine series. 
Although only the odd terms in the sine series occur, they fall only as n—!. This conditional 
convergence is like that of the alternating harmonic series. Physically this means that our 
square wave contains a lot of high-frequency components. If the electronic apparatus will 
not pass these components, our square-wave input will emerge more or less rounded off, 


perhaps as an amorphous blob. | 


2h ( sinx  sin3x  sin5x ) 
- ; 


Example 79.2.2 — FULL-WAVE RECTIFIER 


As a second example, let us ask how well the output of a full-wave rectifier approaches 
pure direct current. Our rectifier may be thought of as passing the positive peaks of an 
incoming sine wave and inverting the negative peaks, as shown in Fig. 19.8. This yields 


sinwt, O<at <7, 


fMO= | (19.27) 


—sinwt, —m <at <0. 


Since f(t) as defined here is even, no terms of the form sinnwt will appear. Again, from 
Eqs. (14.2) and (14.3), we have 


0 i 
1 1 
a= —— f sinor dor) + = [sino d(at) 
4 a 
—1 0 


27. 4 
== [ sinoraon = =. 
a XT 


0 
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FIGURE 19.8 Full wave rectifier. 


Tw 
2f . 
an = — / sin wt cosnot d(at) 
a 
0 


2 2 


mnz—1’ 
0, nodd. 





n even, 


Note that [0, 7] is not an orthogonality interval for both sines and cosines together and we 
do not get zero when n is even. The resulting series is 


2 4 *.  cosnat 
f= —— > a (19.28) 
n=2,4,6,... 
The original frequency, w, has been eliminated; in fact, all its odd harmonics are also 
absent. The lowest-frequency oscillation is 2. The high-frequency components fall off as 
n~*, showing that the full-wave rectifier does a fairly good job of approximating direct 
current. Whether this good approximation is adequate depends on the particular applica- 
tion. If the remaining alternating current components are objectionable, they may be further 
suppressed by appropriate filter circuits. | 


These examples bring out two features characteristic of Fourier expansions:> 


e If f(x) has discontinuities, as in the square wave in Example 19.2.1, we can expect the 
nth coefficient to be decreasing as O(1/n). Convergence is conditional only. 


e If f(x) is continuous (although possibly with discontinuous derivatives as in the full- 
wave rectifier of Example 19.2.2), we can expect the nth coefficient to be decreasing 
as 1/n?, that is, absolute convergence. 


We close this section by providing, in Table 19.1, a list of Fourier series that have been 
introduced either as examples or in the exercises of this chapter. More extensive lists can 
be found in the Additional Readings, particularly in the work by Oberhettinger, but also in 
the texts by Carslaw, Churchill, and Zygmund. 





5G. Raisbeek, Order of magnitude of Fourier coefficients. Am. Math. Mon. 62: 149 (1955). 
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Exercises 


19.2.1 


19.2.2 


19.2.3 


Table 19.1 Some Fourier Series Used in This Text 

















Fourier Series Reference 
i s sinnx _ 3" +x), -w~<x<0 Exercise 19.1.5 
f= 5 (a —x) O<x<az Exercise 19.2.8 
= sinnx x Exercise 19.1.6 
2. — yet = 4 U<x<0 j _ 
4 n 2 Exercise 19.2.7 
, 3 sin(2n + 1)x —n/4, -—m <x <0 Exercise 19.1.7 
© E46 antl | 40/4, O<x<a Eq. (19.26) 
(oe) < 
cosnx af el Exercise 19.1.8(b) 
4. =-—In}]2sin{ —]], -2a<x<z 
— 7% 2 Eq. (19.24) 
cosnx 


(=1)? 





=-—In [200s (5) | —m <x<za _ Exercise 19.1.8(a) 


Me 


= 
Il 
ar 


cos(2n + 1)x = 1 
Qn+1 2 





In eo | U<xX<0 Exercise 19.2.5 


Me 


= 
ll 
° 





Transform the Fourier expansion of a square wave, Eq. (19.26), into a power series. 
Show that the coefficients of x! form a divergent series. Repeat for the coefficients 
of x?. 


Note. A power series cannot handle a discontinuity. These infinite coefficients are the 
result of attempting to beat this basic limitation on power series. 


Derive the Fourier series expansion of the Dirac delta function 5(x) in the interval 
—U<xX <7. 


(a) What significance can be attached to the constant term? 
(b) In what region is this representation valid? 
(c) With the identity 

N 


sin(N x /2) ( ) | 
cosnx = ———_—cos|{N+—]-=], 
7 sin(x /2) 2) 2 


show that your Fourier representation of 6(x) is consistent with Eq. (5.27). 


Expand 6(x — ft) in a Fourier series. Compare your result with the bilinear form of 
Eq. (5.27). 
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ti; dee ; 
ANS. d(x—-th=—+- ) (cosnx cosnt + sinnx sinnt) 
2n re 


t 1 
=5- +=) [cosn(x — 2). 


n=1 


19.2.4 Show that integrating the Fourier expansion of the Dirac delta function (Exercise 19.2.2) 
leads to the Fourier representation of the square wave, Eq. (19.26), with h = 1. 


Note. Integrating the constant term (1/277) leads to a term x/27. What are you going 
to do with this? 


19.2.5 Starting from the Fourier series given as lines 4 and 5 of Table 19.1, show that: 


Scos(2n+1)x 1 |x| 
2 = — In| cot : 
ane 2 2 





19.2.6 Develop the Fourier series representation of 


0, —m <oat <0, 
fO=)., 
sin wt, O<oat <Zz. 
This is the output of a simple half-wave rectifier. It is also an approximation of the solar 
thermal effect that produces “tides” in the atmosphere. 


cosnwt 
n2—1- 





ANS. f(t) or t a 3 
: =—+-sinwt — — 
zr 2 
n=?2,4,6,... 


19.2.7. A sawtooth wave is given by 
f@)=x, -U<x<7. 
Show that 
[o,@) 
(- pyr! : 
=2 ——— : 
f(x) 2 . sinnx 


19.2.8 A different sawtooth wave is described by 


—5 (x +x), -mw<x<0O 


ro=| 


+3(a — x), O<x<zZ. 


Show that f(x) = Y \(sinnx /n). 


n=1 


19.2.9 A triangular wave (Fig. 19.4) is represented by 


x, O<x<az 


rora| 


—x, —1 <x <0. 
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19.2.10 


19.2.11 


19.2.12 


19.2.13 


Represent f(x) by a Fourier series. 





TU 4 cosnx 
ANS. f(x)=>-— =: 


n 
n=1,3,5,... 
Expand 


2 2 
> xX < XH 


1 
f=). 


. xo > <7 
in the interval [—z., zr]. 
Note. This variable-width square wave is of some importance in electronic music. 


A metal cylindrical tube of radius a is split lengthwise into two nontouching halves. 
The top half is maintained at a potential +V, the bottom half at a potential —V. See 
Fig. 19.9. Separate the variables in Laplace’s equation and solve for the electrostatic 
potential for r < a. Observe the resemblance between your solution for r = a and the 
Fourier series for a square wave. 


A metal cylinder is placed in a (previously) uniform electric field, Eg, with the axis of 
the cylinder perpendicular to that of the original field. 


(a) Find the perturbed electrostatic potential. 


(b) Find the induced surface charge on the cylinder as a function of angular position. 


(a) Find the Fourier series representation of 


0, -21 <x <0 
f@)= 


x, O<x<z. 


(b) From the Fourier expansion show that 





+V 


FIGURE 19.9 Cross section of split tube. 





19.2.14 


19.2.15 


19.2.16 


19.2.17 
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FiGURE 19.10 Rectangular pulse. 


Integrate the Fourier expansion of the unit step function 


0, -1z7<x<0 
1, O<x <Z. 


ro=| 


Show that your integrated series agrees with Exercise 19.2.13. 
n, |x| <1/2n, 


In the interval (—7,7), 5,(x) = 
C yale (* |x| > 1/2n. 


This wave form is the pulse shown in Fig. 19.10. 
(a) Expand 6,,(x) as a Fourier cosine series. 


(b) Show that your Fourier series agrees with a Fourier expansion of 5(x) in the limit 
as n—> 00. 


Confirm the delta function nature of your Fourier series of Exercise 19.2.15 by showing 
that for any f(x) that is finite in the interval [—z, 2] and continuous at x = 0, 
us 
I J (x) [Fourier expansion of 5..(x)] dx = f (0). 
— 
(a) Show that the Dirac delta function 5(x — a), expanded in a Fourier sine series in 
the half-interval (0, L) (0 < a < L) is given by 


2 CO 
8(x —a) == > sin (=) sin (“*). 
n=1 


Note that this series actually describes —d(x + a) + 6(x — a) in the interval 
(-L,L). 


(b) By integrating both sides of the preceding equation from 0 to x, show that the 
cosine expansion of the square wave 


0, O<x<a 


rom ty a<x<L, 
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19.2.18 


19.2.19 


19.2.20 


is 
21. snna 21. snna nex 
f= =D sin( L ) oe as E ) cos ( L ), 
n= 


n=1 





forO<x <L. 


Del 
(c) Show that the term — > — sin (=) is the average of f(x) on (0, L). 
mean L 


Verify the Fourier cosine expansion of the square wave, Exercise 19.2.17(b), by direct 
calculation of the Fourier coefficients. 


(a) A string is clamped at both ends x = 0 and x = L. Assuming small-amplitude 
vibrations, we find that the amplitude y(x, ft) satisfies the wave equation 
ay 1 dy 


x2 v2 ar” 
Here v is the wave velocity. The string is set in vibration by a sharp blow at x =a. 
Hence we have 


dy (x, t) 
ot 
The constant L is included to compensate for the dimensions (inverse length) of 


5(x — a). With 6(x — a) given by Exercise 19.2.17(a), solve the wave equation 
subject to these initial conditions. 





y(x, 0) =0, = Luod(x — a) att = 0. 


Rij el nia nex ni vt 
0 ‘ : ‘ 

ANS. H= sin sin sin 
yd) v a L L L 


n=1 





(b) Show that the transverse velocity of the string dy(x, t)/dt is given by 








dy(x, t) = nia nix nut 
7 =2y sin sin cos . 
at 0d, E i L 


A string, clamped at x = 0 and at x = L, is vibrating freely. Its motion is described by 
the wave equation 


Ou 7)).. Pimace t) 
are ax2 


Assume a Fourier expansion of the form 





[oe] 
u(x,t) =)" dalt) sin 


n=1 


and determine the coefficients b,,(t). The initial conditions are 


u(x,0)= f(x) and © u(x, 0) = 8). 
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Note. This is only half the conventional Fourier orthogonality integral interval. How- 
ever, as long as only the sines are included here, the Sturm-Liouville boundary condi- 
tions are still satisfied and the functions are orthogonal. 





nz vt . niwovt 
ANS. by,(t) = Ayn cos —— + B, sin , 
L L 
L L 
a > f feysin a B — f stoysin 
=— sin —— dx, = —. x) sin —— dx. 
eT eve ge oR ee | i 
0 0 


19.2.21 (a) Let us continue the vibrating string problem in Exercise 19.2.20. We assume now 
that the presence of a resisting medium will damp the vibrations according to the 
equation 

d°u(x, ft) 5 0°u(x, ft) du(x, t) 
=U FY 


at? ax? ot 


Introduce a Fourier expansion 





[o.@) 
W(x, 0) =) ba() sin 


n=1 


and again determine the coefficients b,,(t). Take the initial and boundary condi- 
tions to be the same as in Exercise 19.2.20. Assume the damping to be small. 


(b) Repeat, but assume the damping to be large. 
ANS. 


k 2 
(a) by(t) =e /?[ A, cos @pt + By Sin@nt], a= (™)-(5) >0, 


L 
2 _ nx 2 _ nx k 
An=F F(x) sin > — dx, By= a ee ead he 
0 0 


k\? 2 
(b) b,(t) =e “7A, coshont + Bn sinho,f], “=(5) -(=) > 0, 


E L 
2 | Ax 2 _ nmx k 
An=F F@) sin —— dx, Bn =e g(x) sin —— eT am 
0 0 


19.3. GIBBS PHENOMENON 


The Gibbs phenomenon is an overshoot, a peculiarity of the Fourier series and other eigen- 
function series at a simple discontinuity. An example is seen in Fig. 19.2. 


Partial Summation of Fourier Series 


To better understand the Gibbs phenomenon we examine methods for the partial summa- 
tion of Fourier series. This procedure is unlikely to lead to convenient solutions of practical 
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problems for which Fourier series are ideal, but it may provide insight that is needed for 
our present study. 

We start from the Fourier series of a function f(x) in exponential form, truncating it to 
retain terms only for n < |r| and labeling the truncated expansion f, (x): 


- a 
1 |, 
f= > Ger. max | fe Pe de 
—1 


n=—-r 


Combining these equations in a way useful for the present discussion, we have 





TU 
1 _— 
f(x) = =| fo a a (19.29) 
se n=—r 
The summation in Eq. (19.29) is a geometric series. Using a result easily obtained from 
Eq. (1.96), 
> y _ yrs re _ yrt3 _ yoy) 
[=y yz — y-I7 


n=—-r 


we set y = e!@—, after which we can identify the resulting expression as a quotient of 
sine functions:° 





ei +Q)@—1 _ pir + 5)(@-1) __ sin[(r + Hon] 











: 
ein) = 19.30 
pz ei (x—t)/2 _ e—i@—t)/2 sin 5(x = t) ( ) 
Inserting Eq. (19.30) into Eq. (19.29), we reach 
4s 
1 sin[(r + 5)(x — 1] 
fr@) = [ro — dt (19.31) 
20 sin 5(x — t) 
—1 


This is convergent at all points, including t = x. Equation (19.31) shows that the quantity 


1 sin[(r + 3)@-9] 
2x sing (x —1) 





is in the large-r limit a Dirac delta distribution. 


Square Wave 


For convenience of numerical calculation we consider the behavior of the Fourier series 
that represents the periodic square wave 


O<x <7Z, 


fa)= (19.32) 


, —a <x <0. 


wN| SN| > 


6This series also occurs in the analysis of a diffraction grating consisting of r slits. 





19.3 Gibbs Phenomenon 959 


This is essentially the square wave used in Example 19.2.1, and we immediately see that 
its Fourier expansion is 











2h (sinx  sin3x — sin5x 
f@m= + + te (19.33) 
cd 1 3 5 
Applying Eq. (19.31) to our square wave, we have 
h fsinr+Do-o] hp sinte +0) 
sin[(r + 5)(x — sin[(r + 5)(x — 
fa= / —= dt / —- dt. 
4n sin (x — t) 4n sin 5(x — ft) 
Making the substitution x — ¢ = s in the first integral and x — t = —s in the second, we 
obtain 
h i sin(r + 4)s h ra sin(r + 4)s 
f(x = : —+—ds / —+—ds. (19.34) 
4a sin 55 4a sin 58 
—nr+x —m—x 


It is important to note that both the integrals in Eq. (19.34) have the same integrand, and 
therefore have the same indefinite integral, which we denote ®(t). We may therefore write 





f= Z[o@-o-r+n] -Z[o-n-ecr—n] 








= 2 [ow 0( »] [| % w+x)— O(—2 x), (19.35) 


where the second line of Eq. (19.35) is an obvious rearrangement of the first. However, 
this second line is useful because it shows that we can also write f(x) as 


x 


. 1 —+X 1 
Poin = / SuSE i / seule 2 (19.36) 
TU 





al - 4 
sin 58 4n sin 58 


We are now ready to consider the partial sums in the vicinity of the discontinuity, x = 0. 
For small x, the denominator of the second integrand approaches —1, and the second inte- 
gral therefore becomes negligible in the limit x — 0. On the other hand, the first integrand 
becomes large near s = 0, and the value of the first integral depends on the magnitudes of 
r and x. If we now introduce the new variables p =r + 5 and € = ps, we have (noting 
that the integrand is an even function of s) 

px 


pl sing dé 
IES / sin(/2p) p i 
0 





Calculation of Overshoot 


We are now prepared to make a computation of the Fourier series overshoot. From 
Eq. (19.37), we see that for any finite r, f,(0) will be zero, giving at x = 0 the average 
of the two square-wave values (+//2 and —h/2). However (keeping r fixed), Eq. (19.37) 
also tells us that f(x) will increase as px becomes nonzero, reaching a maximum when 





960 


Chapter 19 Fourier Series 


px =a. This maximum, which we will shortly show constitutes an overshoot, will there- 
fore occur at x = 2/p, which is approximately x = w/r. We thus see that the location 
of the overshoot maximum will differ from x = 0 in a manner approximately inversely 
proportional to the number of terms taken in the Fourier expansion. 

To estimate the maximum value of f(x), we substitute px = a into Eq. (19.37), which 
we then simplify by making the good approximation sin(€/2p) ~ €/2p: 





h fi sn€d& oh ( ane dé. (19.38) 


Ira) = 5 | a ainte/op) le 
0 0 


If the upper limit of the final integral of Eq. (19.38) were replaced by infinity, we would 
have 
[o,@) 
/ eS ees 19.39) 
ee os 
0 
a result found in Example 11.8.5. Note that this replacement would cause f(x) to have 
the value h/2, which is the exact value of f(x) for x > 0. 
The integral we would have to add to that of Eq. (19.38) to obtain the infinite range is 


(oe) 


sin€ 
—E dé = —si(z) ; (19.40) 


os 


we have identified this integral as the sine integral function si(x) introduced in Table 1.2 
and plotted in Fig. 13.6. Thus, 


f sing aaa 19.41 
re f=, +si(z). ( 3 ) 


The graph of si(x) shows that si(z) > 0, indicating an overshoot. A direct demonstration 
that our integral is larger than 2/2 can also be deduced by writing 


lee) 32 5x 
/ f a af = [a (19.42) 
0 au 370 


The first integral on the left-hand side has value 2/2, while each of those to be subtracted 
is negative (and therefore makes a further positive contribution). 
A Gaussian quadrature or a power-series expansion and term-by-term integration yields 





us 
2 f sing 
— | —— dé =1.1789797..., (19.43) 
1 g 
0 
which means that the Fourier series tends to overshoot the positive corner of the square 


wave by some 18% and to undershoot the negative corner by the same amount. This behav- 
ior is illustrated in Fig. 19.11. The inclusion of more terms (increasing r) does nothing to 
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100 terms 
=~ 80 60 40 











0.06 0.08 





FIGURE 19.11 Square wave: Gibbs phenomenon. 


remove this overshoot but merely moves it closer to the point of discontinuity. The over- 
shoot is the Gibbs phenomenon, and because of it the Fourier series representation may be 
highly unreliable for precise numerical work, especially in the vicinity of a discontinuity. 


The Gibbs phenomenon is not limited to the Fourier series. It occurs with other eigen- 


function expansions. For more details, see W. J. Thompson, Fourier series and the Gibbs 
phenomenon, Am. J. Phys. 60: 425 (1992). 


Exercises 


19.3.1 


19.3.2 


19.3.3 


With the partial-sum summation techniques of this section, show that at a discontinuity 
in f(x) the Fourier series for f(x) takes on the arithmetic mean of the right- and left- 
hand limits: 


f 0) = 51 @o + 0) + fo — 0). 
In evaluating lim s,(xo), you may find it convenient to identify part of the integrand 
as a Dirac delia fimction. 
Determine the partial sum, 5, of the series in Eq. (19.33) by using 


sinmx sin 2ny 


= [ cosmy dy, (b) Y cos(2p — Dy = 








(a) 4 
= 2siny 


0 
Do you agree with the result given in Eq. (19.40)? 


(a) Calculate the value of the Gibbs phenomenon integral 


a . 

2 sint 
Il=— | —dt 
IU t 

0 


by numerical quadrature accurate to 12 significant figures. 
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(b) Check your result by (1) expanding the integrand as a series, (2) integrating term 
by term, and (3) evaluating the integrated series. This calls for double-precision 
calculation. 


ANS. I =1.178979744472. 


Additional Readings 


Carslaw, H. S., Introduction to the Theory of Fourier’s Series and Integrals, 2nd ed. London: Macmillan (1921); 
3rd ed., paperback, Dover (1952). This is a detailed and classic work; includes considerable discussion of 
Gibbs phenomenon in chapter IX. 

Churchill, R. V., Fourier Series and Boundary Value Problems, 5th ed., New York: McGraw-Hill (1993). 
Discusses uniform convergence in Section 38. 

Jeffreys, H., and B. S. Jeffreys, Methods of Mathematical Physics, 3rd ed. Cambridge: Cambridge University 
Press (1972). Termwise integration of Fourier series is treated in section 14.06. 

Kufner, A., and J. Kadlec, Fourier Series. London: Iliffe (1971). This book is a clear account of Fourier series in 
the context of Hilbert space. 

Lanczos, C., Applied Analysis. Englewood Cliffs, NJ: Prentice-Hall (1956), reprinted, Dover (1988). The book 
gives a well-written presentation of the Lanczos convergence technique (which suppresses the Gibbs phe- 
nomenon oscillations). This and several other topics are presented from the point of view of a mathematician 
who wants useful numerical results and not just abstract existence theorems. 

Oberhettinger, F., Fourier Expansions; A Collection of Formulas. New York: Academic Press (1973). 


Zygmund, A., Trigonometric Series. Cambridge: Cambridge University Press (1988). The volume contains an 
extremely complete exposition, including relatively recent results in the realm of pure mathematics. 


20.1 


CHAPTER 20 


INTEGRAL TRANSFORMS 


INTRODUCTION 


Frequently in mathematical physics we encounter pairs of functions related by an expres- 
sion of the form 


b 
saa / f(t) K (x, t)dt, (20.1) 


where it is understood that a, b, and K (x,t) (called the kernel) will be the same for all 
function pairs f and g. We can write the relationship expressed in Eq. (20.1) in the more 
symbolic form 


gs(x)=Lf), (20.2) 


thereby emphasizing the fact that Eq. (20.1) can be interpreted as an operator equation. The 
function g(x) is called the integral transform of f(t) by the operator £, with the specific 
transform determined by the choice of a, b, and K (x, t). The operator defined by Eq. (20.1) 
will be linear: 


b b b 
[ins norcena=f poxe.nars [ AOKC.nar (20.3) 


b 


b 
fet@xend=cf foKe. t)dt. (20.4) 


a 


In order for transforms to be useful, we will shortly see that we need to be able to 
“undo” their effect. From a practical viewpoint, this means that not only must there exist 
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an operator £~!, but also that we have a reasonably convenient and powerful method of 
evaluating 


LoIg@)=fO (20.5) 


for an acceptably broad range of g(x). The procedure for inverting a transform takes a 
wide variety of forms that depend on the specific properties of K (x, t), so we cannot write 
a formula that is as general as that for £ in Eq. (20.1). 

Not all superficially reasonable choices for the kernel K (x, t) will lead to operators £ 
that have inverses, and even for strategically chosen kernels it may be the case that £ and 
£7! will only exist for substantially restricted classes of functions. Thus, the entire devel- 
opment of the present chapter is restricted (for any given integral transform) to functions 
for which the indicated operations can be carried out. 

Before embarking on a study of integral transforms, we may well ask, ““Why are integral 
transforms useful?” Their most common applications are in situations illustrated schemat- 
ically in Fig. 20.1, where we have a problem that can be solved only with difficulty, if 
at all, in its original formulation (usually in ordinary space, sometimes called direct or 
physical space). However, it may happen that the transform of the problem can be solved 
relatively easily. Our strategy, then, will be to formulate and solve our problem in the trans- 
form space, after which we transform the solution back to direct space. This strategy often 
works because the most popular integral transforms are changed in simple ways by differ- 
entiation and integration operators, with the result that differential and integral equations 
assume relatively simple forms. This feature will be discussed and illustrated at length later 
in this chapter. 

Another frequent use of integral transforms is to use one, together with its inverse, to 
form an integral representation of a function that we originally had in an explicit form. 
This move (which appears to be in the direction of generating greater complexity) has 
value that arises from the relatively simple behavior of the transforms of differentiation 
and integration operators. Procedures involving integral representations are also presented 
in later sections of this chapter. 
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FiGURE 20.1 Schematic: use of integral transforms. 
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Some Important Transforms 


The integral transform that has seen the widest use is the Fourier transform, defined as 


1 7, : 
sO) = Fa / f@e dt. (20.6) 


The notation for this transform is not entirely universal; some writers omit the prefactor 
1/./2s; we keep it because it causes the transform and its inverse to have formulas that are 
more symmetrical. In applications involving periodic systems, one occasionally encounters 
a definition with kernel exp(2ziwt/ag), where ao is a lattice constant. These differences 
in notation do not change the mathematics, but cause formulas to differ by powers of 27 
or ag. Caution is therefore advised when combining material from different sources. 

We have defined the Fourier transform in a notation that assigns the symbol w to the 
transform variable. We did so because, in studying signal processing (an important use 
of Fourier transforms), the function f(t) usually represents the time behavior of a signal 
(typically a wave distribution of some kind). Its Fourier transform, g(w), can then be iden- 
tified as the corresponding frequency distribution. However, it is worth pointing out that 
Fourier transforms turn up in contexts far removed from signal-processing problems; they 
can be used to advantage in evaluating integrals, in alternative formulations of quantum 
mechanics, and in a wide range of other mathematical procedures. 

A second transform that has historically been of great importance is the Laplace 
transform, 


(ee) 


F(s)= i e £(t)dt. (20.7) 


0 


One of its useful features is the fact that under transformation, differential equations 
become algebraic equations (as we shall see in detail in Section 20.8). Since algebraic 
equations are usually easier to solve than differential equations, this feature lends itself to 
the strategy illustrated in Fig. 20.1. A disadvantage of the Laplace transform is that the for- 
mula for its inverse is relatively difficult to use. Historically, this difficulty was dealt with 
by developing tables of Laplace transforms (which can be used to identify inverses). As 
digital computers have become more powerful, the use of Laplace transforms has declined, 
but they remain sufficiently useful that we treat them in some detail in the present chapter. 
Among other transforms that have seen significant use, we mention here two: 


1. The Hankel transform, 
CO 
g(a) = f fet Jn(aadat. (20.8) 
0 


This transform represents the continuum limit of the Bessel series we studied in 
Eqs. (14.47) and (14.48). 
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2. The Mellin transform, 
foe) 
s(ay= f foe lar, (20.9) 
0 


We have actually used the Mellin transform without calling it by name; for example, 
g(a) = I'(q) is the Mellin transform of f(t) = e~’. Many Mellin transforms are given 
in a text by Titchmarsh (see Additional Readings). 


20.2 FOURIER TRANSFORM 
We proceed now to a more detailed discussion of the Fourier transform, 
g(w) = oa / f tye dt. (20.10) 


If we rewrite the exponential in Eq. (20.10) in terms of the sine and cosine, and then 
restrict consideration to functions that are assumed to be either even or odd functions of x, 
we obtain variants of the original form that are also useful integral transforms: 


sc(o)= [2 f seQeoserar, (20.11) 
0 
DP. 
n(o)= 2 i f (t)sinet dt. (20.12) 
0 


These formulas define the Fourier cosine and Fourier sine transforms. Their kernels, 
which are real, are natural for use in studies of wave motion and for extracting informa- 
tion from waves, particularly when phase information is involved. The output of a stellar 
interferometer, for instance, involves a Fourier transform of the brightness across a stellar 
disk. The electron distribution in an atom may be obtained from a Fourier transform of the 
amplitude of scattered x-rays. 


Example 20.2.1 — Some FouriER TRANSFORMS 


1. f(t) =e~“!"!, with w > 0. To deal with the absolute value, we break the transform 
integral into two regions: 


0 lee) 
g(a) = a. / ett tion gt a a fectenan 
\ 27 V 27 
—oo 0 
=f 1 1 a 1 _ | 1 2a (20.13) 
“VIrlatio a—io} V2 a+a2' : 
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We note two features of this result: (1) It is real; from the form of the transform, we 
can see that if f(t) is even, its transform will be real. (2) The more localized is f(t), 
the less localized will be g(w). The transform will have an appreciable value until 
@ > a; larger w corresponds to greater localization of f(t). 

f@) =46(t). We easily find 


= 1 ii iwt _ 1 
eo) = 5- f aepelar= = (20.14) 


This is the ultimately localized f(t), and we see that g(w) is completely delocalized; 
it has the same value for all w. 

f(t) = 2a./1/2m /(a? + t*), with a > 0. One way to evaluate this transform is by 
contour integration. It is convenient to start by writing initially 





( )= Jo eit ss 
oe on | G=ialeio) 
—0o 
The integrand has two poles: one at t = ia with residue e-°”/i and one at t = —ia 


with residue et?” /(—i). If w > 0, our integrand will become negligible on a large 
semicircle in the upper half-plane, so an integral over the contour shown in Fig. 20.2(a) 
will be that needed for g(w). This contour encloses only the pole at t = ia, so we get 


eee 





g(@) = - (277i) (w>0). (20.15) 
2m 


i 

However, if w < 0, we must close the contour in the lower half-plane, as in 
Fig. 20.2(b), circling the pole at t = —ia in a clockwise sense (thereby generating 
a minus sign). This procedure yields 


eto 





g(@) = 2 (—2z i) (w <0). (20.16) 
20 


-1 


If @ = 0, we cannot perform a contour integration on either of the paths shown in 
Fig. 20.2, but we then do not need this sophisticated an approach, as we have the 











FiGurE 20.2 Contours for third transform in Example 20.2.1. 
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elementary integral 


1 


ae / nee (20.17) 
ae Pq . 


=O 
Combining Eqs. (20.15)-(20.17) and simplifying, we have 


—a|o| 


g(@) =e 


Here we Fourier transformed the transform from our first example, recovering the 
original untransformed function. This provides an interesting clue as to the form to 
be expected for the inverse Fourier transform. It is only a clue, because our example 
involved a transform that was real (i.e., not complex). a 


An important Fourier transform follows. 


Example 20.2.2 — FourRIER TRANSFORM OF GAUSSIAN 


at 


. . . ee 
The Fourier transform of a Gaussian function e , with a > 0, 


1 [o,@) 
2; 
=e f omer 
° J 20 
—oo 
can be evaluated analytically by completing the square in the exponent, 


; 2 2 

t+iot = a\t is = 
~a = , 
2a 4a 





which we can check by evaluating the square. Substituting this identity and changing the 
integration variable from ¢t to s = t — iw/2a, we obtain (in the limit of large T) 


T-iw/2a 
1 ¢-0?/4a eds. (20.18) 
IT 


—T-iw/2a 





g(a) = 


The s integration, shown in Fig. 20.3, is on a path parallel to, but below the real axis by 
an amount iw/2a. But because connections from that path to the real axis at +T make 
negligible contributions to a contour integral and since the contours in Fig. 20.3 enclose no 











FiGure 20.3. Contour for transform of Gaussian in Example 20.2.2. 
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singularities, the integral in Eq. (20.18) is equivalent to one along the real axis. Changing 
the integration limits to -koo and rescaling to the new variable = s/,/a, we reach 


[o,2) 1 [o,e) 
2 2 IU 
“dt=— | e dé =,/-, 
Je = fe é 7 
—oo 





where we have used Eq. (1.148) to evaluate the error-function integral. Substituting these 
results we find 
2 


1 wo 
g(o) = Va exp (-<). (20.19) 


: : ; : : a : mr: 
again a Gaussian, but in w-space. An increase in a makes the original Gaussian e~“ 
narrower, while making wider its Fourier transform, the behavior of which is dominated 


by the exponential eo [4a a 


Fourier Integral 


When we first encountered the delta function, its representation which is the large-n limit of 
n 
1 iwt 
bn (t) = — | eda, (20.20) 
20 
—n 
was identified as particularly useful in Fourier analysis. We now use that representation 


to obtain an important result known as the Fourier integral. We write the fairly obvious 
equation, 


f(x) = lim, i f (0) 8n(t — x)dt 
iF oe 
= jim — i f(t) jae dw |dt. (20.21) 


We now interchange the order of integration and take the limit n — ov, reaching 
1 CO [o,@) 
f(x)=— i dw i] wiper, 
20 
—c —co 


Finally, we rearrange this equation to the form 


[o,@) [o,@) 
f= oa / e (da / fie de. (20.22) 
—cC —cCo 
Equation (20.22), the Fourier integral, is an integral representation of f(x), and will be 
more obviously recognized as such if the inner integration (over t) is performed, leaving 
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unevaluated that over w. In fact, if we identify the inner integration as (apart from a factor 
»/1/2z7c) the Fourier transform of f(t), and label it g(w) as in Eq. (20.10), then Eq. (20.22) 


can be rewritten 
CO 
1 —iot 
ft)= a / g(aje''da, (20.23) 
ca 
—0o 


showing that whenever we have the Fourier transform of a function f(t) we can use it to 
make a Fourier integral representation of that function. 

The Fourier integral formula, written as in Eq. (20.23), illustrates the value of Fourier 
analysis in signal processing. If f(t) is an arbitrary signal, Eq. (20.23) describes the signal 
as composed of a superposition of waves e~'® at angular frequencies! w, with respec- 
tive amplitudes g(w). Thus, the Fourier integral is the underlying justification that one can 
express a signal either by its tme dependence f(t) or by its (angular) frequency distribu- 
tion g(w). 

Before leaving the Fourier integral, we should remark that our derivation of it did not 
provide a rigorous justification for the reversal of the order of integration and the passage 
to the infinite-n limit. The interested reader can find a more rigorous treatment in, for 
example, the work Fourier Transforms by I. N. Sneddon (Additional Readings). 


Example 20.2.3 — FourIER INTEGRAL REPRESENTATION 


From the first transform of Example 20.2.1, we found that f(t) = e~“'"! has Fourier trans- 
form g(w) = /1/2z 2a/(a* + w*). If we substitute these data into Eq. (20.23), we obtain 





le) : ic,2) 
=i) 1 Jae vt a ett 
eal = FQ) = 5 / age / ip gil: (20.24) 
—0o —0O 


Equation (20.24) provides an integral representation for exp(—a|f|) that contains no 
absolute value signs and may constitute a useful starting point for various analytical mani- 
pulations. We will shortly encounter some more substantive examples with immediate 
applications for physics. a 


Inverse Fourier Transform 


As the reader may have noticed, Eq. (20.23) is a formula for the inverse Fourier trans- 
form. Note that the regular (“direct’””) and inverse Fourier transforms are given by very 
similar (but not quite identical) formulas. The only difference is in the sign of the complex 
exponential. This change of sign causes two successive applications of the Fourier trans- 
form not to be identical with applying the transform and then its inverse, and the difference 
shows up when g(a) is not real.” 





'The wave e~!! has period 27/w, thus frequency v = w/2z.. Its angular frequency (radians per unit time rather than cycles) is 
2mv=o. 

2Even functions have real Fourier transforms; the transforms of odd functions are imaginary. A function that is neither even nor 
odd will have a Fourier transform that is complex. 
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The analysis of the preceding subsection can also be applied to the Fourier cosine and 
sine transforms. For convenience, we summarize the formulas for all three varieties of the 
Fourier transform and their respective inverses. 


g(@) = ral f (the''dt, (20.25) 
f= me / g(we da, (20.26) 
5 CO 
s(o)= [= [ feocosarar, (20.27) 
0 
5 CO 
foi = [2 | storcosorde. (20.28) 
0 
aa 
esto) = [2 f sosinora (20.29) 
0 
mi 
i= | etorsinorde. (20.30) 
0 


Note that the Fourier sine and cosine transforms only use data for 0 < t < oo. Therefore, 
even though it is possible to evaluate the corresponding inverse transforms for negative f, 
the results may be irrelevant to the actual situation at those ¢ values. But if our function 
f(t) is even, then the cosine transform will reproduce it faithfully for negative t; odd 
functions will be properly described for negative t by the sine transform. 


Example 20.2.4 — FINITE WAVE TRAIN 


An important application of the Fourier transform is the resolution of a finite pulse into 
sinusoidal waves. Imagine that an infinite wave train sinwot is clipped by Kerr cell or 
saturable dye cell shutters so that we have 


: Nx 

sinwot, |t| < —, 
on) 
Nx 

0, |t| > —. 
any 


fMO= (20.31) 
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This corresponds to N cycles of our original wave train (Fig. 20.4). Since f(t) is odd, we 
use the Fourier sine transform, Eq. (20.28), to obtain 


Nx/o 
8s(@) = 2 / sin wot sinwt dt. (20.32) 
" 0 
Integrating, we find our amplitude function: 


oe 2 ee —o)(Nx/oo)1 _ sinl(oo + ey 
ie eae 2(cop — wv) 2(wp + @) 





(20.33) 


It is of considerable interest to see how gs(w) depends on frequency. For large wo and 

@ © wo, only the first term will be of any importance because of the denominators. It is 

plotted in Fig. 20.5. This is the amplitude curve for the single-slit diffraction pattern. It has 
zeros at 

a-o Aw 1 2 


wo wo N°? N’ 


For large N, gs(@) may also be interpreted as proportional to a Dirac delta distribution. 








and so on. (20.34) 








FIGURE 20.4 Finite wave train. 











FiGURE 20.5 Fourier transform of finite wave train. 
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Since a large fraction of the frequency distribution falls within its central maximum, the 
half-width of that maximum, 


Ao= W' (20.35) 
is a good measure of the spread in angular frequency of our wave pulse. Clearly, if N is 
large (a long pulse), the frequency spread will be small. On the other hand, if our pulse is 
clipped short, N small, the frequency distribution will be wider. 

The inverse relationship between frequency spread and pulse length is a fundamental 
property of finite wave distributions; the precision with which a signal can be identified as a 
specific frequency depends on the pulse length. This same principle finds expression as the 
Heisenberg uncertainty principle of quantum mechanics, in which position uncertainty 
(the quantum variable corresponding to pulse length) is inversely related to momentum 
uncertainty (quantum analog of frequency). It is worth noting that the uncertainty principle 
in quantum mechanics is a consequence of the wave nature of matter and does not depend 
on additional ad hoc postulates. a 


Transforms in 3-D Space 


Applying the Fourier transform operator in each dimension of a three-dimensional (3-D) 
space, we obtain the extremely useful formulas 


1 ik-r 73 
80) = Bap | Pear (20.36) 


f(r) / g(kye KT ak. (20.37) 


_ 1 
=. (27)3/2 


These integrals are over all space. Verification, if desired, follows immediately by sub- 
stituting the left-hand side of one equation into the integrand of the other equation and 
choosing the integration order that permits the complex exponentials to be identified as 
delta functions in each of the three dimensions. Equation (20.37) may be interpreted as 
an expansion of a function f(r) in a continuum of plane waves; g(k) then becomes the 
amplitude of the wave exp(—ik-r). 


Example 20.2.5 Some 3-D TRANSFORMS 


1. Let’s find the Fourier transform of the Yukawa potential, e~*’ /r. Using the notation 
[---]? to denote the Fourier transform of the included object, we seek 


-3T = 
eet 1 ener iar ce 
- |©-aarn/ eas, (20.38) 
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Perhaps the simplest way to proceed is to introduce the spherical wave expansion for 
exp(ik- r), Eq. (16.61). Equation (20.38), written in spherical polar coordinates, then 
assumes the form 





r 


eo . 4a i ‘1 —ar: m kym 
]w-= aarn | rar f ao, Sie jy(kr)¥)" (Qe) Y"(Qr). (20.39) 
0 Im 


All terms of the angular integration vanish except that with / = m = 0. For that term, 
each yy has the constant value 1/./ 477, and Eq. (20.39) reduces to 





r (27)3/2 


oe T Am oS ets 
(k) = Jr e " jo(kr)dr. (20.40) 
0 


Inserting jo(kr) = sinkr/kr, the r integration becomes elementary, and we reach 


ae a ee 3 20.41 
|]: = Oa k2 4 2" ee 





We wrote Eq. (20.41) as we did to make obvious that if the transform were scaled 
without the factor 1/ (27r)3/2, we would have the well-known result 47 / (k? +a”). 

2. Even more important than the Fourier transform of the Yukawa potential is that of 
the Coulomb potential, 1/r. An attempt to evaluate this transform directly leads to 
convergence problems, but it is easy to evaluate it as the limiting case of the Yukawa 
potential with w = 0. Thus, we have the extremely important result, 


1 ee 20.42 
|= Game — 


3. From the relation between the Fourier transform and its inverse, Eq. (20.42) can effec- 
tively be inverted to yield 


ye m\1/2 1 

EB (k) = (5) - (20.43) 

4. Another useful Fourier transform is that of the hydrogenic 1s orbital, which (in unnor- 
malized form) is exp(—Zr). A simple way to evaluate this transform is to differentiate 
the transform for the Yukawa potential with respect to its parameter, a in Eq. (20.41). 
Noting that differentiation with respect to this parameter commutes with the transform 
operator (which involves integration with respect to other variables), we have 








afer ae 1 82 Z 
-7| r ]w=[¢ |®-a eo (20.44) 





5. 


Exercises 


20.2.1 
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Consider next an arbitrary function whose angular dependence is a spherical harmonic 
(i.e., an angular momentum eigenfunction). Using spherical polar coordinates, we 
look at 


m ‘ 1 ii 2 m ik-r 
[reorren] d= aan f roar f aay ane 
0 


4n ii 2 
= aan f for ar [ a2, Y;" (Q;) 
0 


x Sil jv (kry¥ (Q—)Y" (Qr)*, 
I'm’ 
where we have inserted the spherical wave expansion, Eq. (16.61), for exp(ik - r). 
Because the ¥/” are orthonormal, the summation reduces to a single term, and we 
have 


[rove] "a = ¥/"(Qx) i f(r) (kr )r?dr. (20.45) 


ts one 


Equation (20.45) shows that a function with spherical harmonic angular dependence 
has a transform containing the same spherical harmonic and that the radial dependence 
of the transform is essentially a Hankel transform. Compare with Eq. (20.8). 

As a final example, consider the Fourier transform of a 3-D Gaussian. Again using 
spherical polar coordinates and the spherical wave expansion (a procedure generally 
applicable for transforms of spherically symmetric functions), we get 





CO 
=ar ae 2 -ar ; 
[¢ le (k) = aaa 5 J: e-@” jn(kr)dr. (20.46) 
0 
Using methods similar to those of Example 20.2.2, we find 
1 2 
—ar —k-/4a 
[e aK W= aaa? (20.47) 
This result could also be obtained using Cartesian coordinates and using the result of 
Example 20.2.2 in each of the three dimensions. a 


(a) Show that g(—w) = g*(@) is a necessary and sufficient condition for f(x) to be 
real. 


(b) Show that g(—w) = —g*(@) is a necessary and sufficient condition for f(x) to be 
pure imaginary. 


Note. The condition of part (a) is used in the development of the dispersion relations of 
Section 12.8. 
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20.2.2 


20.2.3 


20.2.4 


20.2.5 
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The function 


Ix] <1 


1, 
ro)=| 


» |x| >1 


oO 


is a symmetrical finite step function. 


(a) Find g-(w), Fourier cosine transform of f(x). 


(b) Taking the inverse cosine transform, show that 


Co 
2 sin W COS Wx 
f@=— | — — do. 
4 @ 
0 
(c) From part (b) show that 
0, |x| >1, 
oOo. x 
[ew 7 Ix| =1, 
= cra 
= [ale 
2 


(a) Show that the Fourier sine and cosine transforms of e~“ are 


()= i @ ye jes a 
§s\@ Dio Ecl@ pee 


Hint. Each of the transforms can be related to the other by integration by parts. 
(b) Show that 








CO 
@sin ox wT 
=—e™, x>0, 
2 
0 
CO 
COS Wx a 
am —e™, x>0 
wo ~ 2a 
0 


These results can also be obtained by contour integration (Exercise 11.8.12). 
Find the Fourier transform of the triangular pulse (Fig. 20.6), 
h(Q—alx|), |x| < 1/a, 
f(x)= 
0, |x| > 1/a. 
Note. This function provides another delta sequence with h = a and a > oo. 
Consider the sequence 


5, (x) n, |x| <1/2n, 
ALS 
" 0, |x| > 1/2n. 





20.2.6 


20.2.7 


20.2.8 
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FiGure 20.6 Triangular pulse. 


This is Eq. (1.152). Express 5, (x) as a Fourier integral (via the Fourier integral theorem, 
inverse transform, etc.). Finally, show that we may write 


CO 
1 : 
d(x) = lim 8,(x) = — / a dhe. 
noo On 
—o0o 


Using the sequence 
bn(x) = Te exp(—n?x?), 


show that 


[o,@) 
1 ‘pe 
d(x) = — i é dk. 
20 
—oo 
Hint. Remember that 5(x) is defined in terms of its behavior as part of an integrand. 


The formula 


CO [o,@) 
b(t—x)= ze < eC day = os / get g IO ay 
20 20 
—CO —cC 
can be identified as the continuum limit of an eigenfunction expansion. Derive sine and 
cosine representations of 6(¢ — x) that are comparable to the exponential representation 
just given. 
CO [o,@) 
2 ; : 2 
ANS. — | sinwt sinwxdw, — | cost coswx dw. 
ua I 
0 


In a resonant cavity, an electromagnetic oscillation of frequency wo dies out as 
Ao e @0t/2Q er t>0, 
A(t) = 
0, t <0. 


The parameter Q is a measure of the ratio of stored energy to energy loss per cycle. 
Calculate the frequency distribution of the oscillation, a*(w)a(w), where a(w) is the 
Fourier transform of A(t). 
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20.2.9 


20.2.10 


20.2.11 


Note. The larger Q is, the sharper your resonance line will be. 
Ab 1 


AND» BOAO) = 7 Ga ape ale 





Prove that 








ee i Tt Eot 
h / eda 7 exp ( si) x0 ( i ) t>0, 
ani J Ey—iT/2—ho | 9 2 ai 


This Fourier integral appears in a variety of problems in quantum mechanics: barrier 
penetration, scattering, time-dependent perturbation theory, and so on. 


Hint. Try contour integration. 


Verify that the following are Fourier integral transforms of one another: 


2 1 
(a) q Jghags? MI<% | and Joay), 
0, |x| >a, 
0, |x| <a, 


and Yo(aly|), 


(b) DD 1 _ 
—,/ -———.,,_ |x| >a, 
cA /x2 + a? 


4 1 
(c) f= and Ko(aly|). 


(d) Can you suggest why Jp(ay) is not included in this list? 


Hint. Jo, Yo, and Ko may be transformed most easily by using an exponential represen- 
tation, reversing the order of integration, and employing the Dirac delta function expo- 
nential representation, Eq. (20.20). These cases can be treated equally well as Fourier 
cosine transforms. 


Show that the following are Fourier transforms of each other: 


3 Z 
i" J,(¢) and [2 mova —) Mae aol eet, 
0, |x| > 1. 


T, (x) 1s the nth-order Chebyshev polynomial. 


Hint. With T, (cos) = cosn0, the transform of T,(x)(1 — x2)7!/2 leads to an integral 
representation of J,(t). 





20.2.12 


20.2.13 


20.2.14 


20.2.15 
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Show that the Fourier exponential transform of 


Pr(e), |e) <1 
f(w= 
0, |u| >1 


is (2i" /27) j,(kr). Here P,(j) is a Legendre polynomial and j,(kr) is a spherical 

Bessel function. 

(a) Show that f(x) =x~!/? is a self-reciprocal under both Fourier cosine and sine 
transforms; that is, 


ee) 


/2 
= [x Peosatdx =”, 
a4 
0 
5) Co 
jz fo isinseas al? 
T 
0 


(b) Use the preceding results to evaluate the Fresnel integrals 
[o,@) [o,@) 


[ cos? vay and [sinc ay. 
0 0 


k 


The Fourier transform formulas for a function of two variables are 


1 . , 
F(u,v) = =| f(x, ye U*t*Y) dy dy, 


112, m\1/2 1 
Show that | k) = (4) sa 


1 . 
fa, y= aa i Fu, vye +”) du dv, 
a 


where the integrations are over the entire xy or uv plane. For f(x, y) = f([x?7 + 
y?] 1/2) f(r), show that the zero-order Hankel transforms 


CO 


F(p)= / rf (r) Jo(er)dr, 
0 


fiy= / BEI Gor iao: 
0 


are a special case of the Fourier transforms. 


Note. This technique may be generalized to derive the Hankel transforms of order v = 0, 





7 iE: 3, .... See the two texts by Sneddon (Additional Readings). It might also be noted 
that the Hankel transforms of half-integral orders v = +5 reduce to Fourier sine and 


cosine transforms. 
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20.2.16 Show that the 3-D Fourier exponential transform of a radially symmetric function may 


20.3 


be rewritten as a Fourier sine transform: 


1 f ik-r 73 1/2 i, A 
Ons / [mer dx= k = frp sinkr dr. 
—0oo 0 


PROPERTIES OF FOURIER TRANSFORMS 


Fourier transforms have a number of useful properties, many of which follow directly from 
the transform definition. Using the 3-D transform as an illustration, and letting g(k) be the 
Fourier transform of f(r): 


T j 
[ fir— R)| (k) = e'*Rg(k), (translation), (20.48) 
[ for)| *k) = as g(a'k), (change of scale), (20.49) 

a 
[ren] (= 70), (sign change), (20.50) 
[ r* -r)| "(6) = g*(k), (complex conjugation), (20.51) 
Lv f(r) | ”(k) = —ik (Io, (gradient), (20.52) 

T 

|v? ror)| (k) = —k? g(k), (Laplacian). (20.53) 


The first four of the above formulas can be obtained by carrying out appropriate operations 
on the defining equation of the transform; details are left to the exercises. Equations (20.52) 
and (20.53) are easily established from the inverse transform formula. For example, from 
Eq. (20.37), 


1 
V f(r) = sas | sto[ Ve] dk 


: . 
~ apie | #o[ ike] dk 


= a / [ = ik g(k) Je dk, (20.54) 
showing that —ik g(k) is indeed the Fourier transform of V f(r). It should be noted that 
this demonstration requires the existence of the integrals involved. 

The translation formula is of considerable practical value, as it enables a function that 
is most conveniently described relative to an origin at R to have a transform whose nat- 
ural representation is about the origin in the k space, albeit with a complex phase factor, 
exp(ik- R). This feature will become important, for example, in problems involving atoms 
centered at different spatial points, because the transforms of atomic orbitals on such atoms 
can all be written as centered at a single point in the transform space. Thus, the translation 
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formula can convert a spatially complex problem into a single-center problem (though now 
with oscillatory character due to the phase factors). 

The formulas for the gradient and Laplacian, as well as their one-dimensional (1-D) 
variants, 


[ ro|  @io=iaewy, (first derivative), (20.55) 


E fo] "Wai: Cds, (20.56) 


make the application of these differential operators have simple forms in the transform 
space. As we see from Eq. (20.55), the operation of differentiation corresponds in the 
transform space to multiplication by —ia. 


Example 20.3.1 Wave EQuaATION 


Fourier transform techniques may be used to advantage in handling partial differential 
equations (PDEs). To illustrate the technique, let us derive a familiar expression from 
elementary physics. An infinitely long string is vibrating freely. The amplitude y of the 
(small) vibrations satisfies the wave equation 


dy 1 d?y 
ax2 v2 ar?’ 
where v is the phase velocity of the wave propagation. We take as initial conditions 


dy(x, t) 
ot 


(20.57) 


y(x, 0) = f(x), 





=0, (20.58) 
t=0 
where f is assumed localized, meaning that lim,iioo f(x) =0 
Our method for solving the PDE of Eq. (20.57) will be to take the Fourier transforms (in 
x) of its two members, using o as the transform variable. This is equivalent to multiplying 
Eq. (20.57) by e®* and integrating over x. Before simplifying, we have 





a? t a? t : 
f2e2 TYG ping, al POD piar dy, (20.59) 
ax2 ar 
—oo 
If we recognize 
Y(a,t)= y(x, tel** dx (20.60) 


va J 


as the transform (from our initial variable x to our transform variable a) of the solution 
y(x,t) of our PDE, we can rewrite Eq. (20.59) as 


: eke ) 


(— ia)? Y(a,t)=—s 52 (20.61) 
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Here we have used Eq. (20.56) for the transform of d*y/dx? and moved the operator 
0*/dt?, which is irrelevant to the transform operator, outside the integral, leaving behind 
just Y(q@, ft). 

Our original problem has now been converted into Eq. (20.61), but this new equation 
has the important simplifying feature that the only derivative appearing in it is that with 
respect to t; we have therefore succeeded in replacing our original PDE (in x and ft) with 
an ordinary differential equation (ODE) (in ¢ only). The dependence of our problem on a 
(the variable to which x was converted) is only algebraic. 

This transformation, from a PDE to an ODE, is a significant achievement. We are now 
ready to solve Eq. (20.61), subject to the initial conditions, which we need to express in 
terms of Y. Taking transforms of the quantities in Eq. (20.58), we have 


1 7, . 
Y (a, 0) = —— / f(xye'*dx = F(a), 
Vin J (20.62) 
dY (a, t) 


ot 


=0. 
t=0 








It is important to recognize that F (q) is (in principle) known; it is the Fourier transform of 
the known initial amplitude f(x). 
Solving Eq. (20.61) subject to the initial conditions on Y given in Eq. (20.62), we obtain 


laut —iavt 
¥(a,t) = F(a) — << (20.63) 


We could have written the t dependence as cos(aut), but the exponential form is better 
suited to what we will do next. 

Since we really want our solution in terms of x rather than q, our final step will be to 
apply inverse Fourier transforms to both sides of Eq. (20.63): 


lee) 
1 | eiaut—iax 4 eT iavt—iax 
—_ | F(@) da. (20.64) 
a 2 
Qn J 





Y(a, the! da = 


va J 


The left-hand side of Eq. (20.64) is clearly y(x, t); each term on the right-hand side is an 
inverse transform of F (and is therefore f), but the first exponential, if written e~/@°-, 
can be seen to lead to an inverse transform of argument x — vt, while the second expo- 
nential leads to an inverse transform of argument x + vt. Thus, our final simplification of 
Eq. (20.64) takes the form 


yan = s[ fe <a) fe vt) | (20.65) 


Our solution thus consists of a superposition in which half the amplitude of the original 
wave form is moving toward +x (at velocity v) while the other half of the original wave 
form is moving (also at velocity v) in the —x direction. a 
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Example 20.3.2 HEAT FLOW PDE 


To illustrate another transformation of a PDE into an ODE, let us Fourier transform the 
1-D heat-flow PDE, 

aw 4 

se 

ot ax? 
where the solution (x, t) is the temperature at position x and time f. 

We transform the x dependence, with the transform variable denoted y, writing the 

transform of w(x,t) as W(y,t), and identifying the transform of d7W(x,1t)/dx* as 
—y’W(y, t). Our heat flow equation then takes the form 


avy.) a9 
Age w ’ t ’ 
at a“y’ W(y,t) 

with general solution 
InW(y, t) =—a’y*t +1InC(y), or W=C(ye 


The physical significance of C(y) is that it is the initial spatial distribution of W or, in other 
words, the Fourier transform of the initial temperature profile (x, 0). Thus, if we assume 
the initial temperature distribution is known, then so also is C(y), and our PDE solution, 
the inverse transform of VW, assumes the form 


CO 


1 _ 
VN = = | C(yew@ te dy, (20.66) 


—c 


Further progress depends on the specific form of C(y). If we assume the initial tem- 
perature to be a delta-function spike at x = 0, corresponding to an instantaneous pulse of 
thermal energy at x = tf = 0, we then have as its Fourier transform C(y) = constant, see 
Eq. (20.14). We can now evaluate the integral in Eq. (20.66) to obtain an explicit form for 
w(x, t). With C constant, the functional form of Eq. (20.66) is (apart from the sign of 7) 
just that encountered in Example 20.2.2 for the Fourier transform of a Gaussian, and we 
can evaluate the integral to obtain 


2 Cc x? 
ie) 


This form for w was obtained in Section 9.7, but it arose there as a clever guess that was 
ultimately justified because it led to a solution of the diffusion PDE. a 





Example 20.3.3 COULOMB GREEN’S FUNCTION 


The Green’s function associated with the Poisson equation satisfies the PDE 
V2G(r,r’) = 5(r—r’). (20.67) 


We take the Fourier transform of both sides of this equation with respect to r, desig- 
nating g(k,r’) as the transform of G. Note that r’ is unaffected by the transformation. 
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Using Eq. (20.53), the left-hand side of Eq. (20.67) becomes —k* g(k, r’), while the right- 
hand side, in which the delta function has been translated an amount r’, has according to 
Eq. (20.48) the transform eK 7 (k). Thus, Eq. (20.67) transforms into 


1 ‘ae ail 
2 /\ k-r 
—k* g(k,r) = ay clk’ 


where the transform of the delta function has been evaluated as the 3-D equivalent of 
Eq. (20.14). We may now solve for g: 








1 eikr 
k, r’) = -—__. —, 
g(k,r) Gaye 
and recover G by taking the inverse transform, 
sing 
Gir, r’)=— ! / a eikr ga, = ae 7 Bk tee’) 
(27)3 k2 Qny3 J 


We see that the evaluation is proportional to that of the inverse transform of 1/k7, but for 
argument r — r’. Using Eq. (20.43) (which applies also for the inverse transform because 
it is real), we reach 





; 1 mw\l/2 1 1 1 
Gar) = (5) - : 
(27r)3/2 \2 Ir—r’| 4n |r—r’| 
a result we have previously obtained by other methods (cf. Section 10.2). Note that we did 
not assume G to be a function of r — r’; we found it to have that form. | 


Successes and Limitations 
Some of the above examples illustrate an important role played by the Fourier transform: 


e Use of the Fourier transform can convert a PDE into an ODE, thereby reducing the 
“degree of transcendence” of the problem. 


All the examples also illustrate the procedure sketched schematically in Fig. 20.1: 


e Fourier transformation can often convert a difficult problem into one which we are 
able to solve. A useful form for our solution can then be obtained by transforming it 
back to physical space. 


Despite these successes, it is worth noting that not all problems posed as differential 
equations are amenable to Fourier-transform solution methods. Some of the limitations 
arise from the implicit requirement that the necessary transforms and their inverses exist. 
We can also expect Fourier methods to work only when the solution is unique, as the 
process of taking a transform and then solving an algebraic equation produces a single 
result, and not a set of two or more linearly independent solutions. 

Usually the boundary conditions are the proximate reason that a differential equa- 
tion solution is unique, and the requirement that an (exponential) Fourier transform exist 
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imposes Dirichlet boundary conditions at infinity. For 1-D systems on the semi-infinite 
range 0 < x < oo, use of the Fourier sine transform imposes a Dirichlet condition at the 
finite boundary x = 0, while use of the cosine transform corresponds to a Neumann bound- 
ary condition there. 

Additional opportunities for solving differential equations by transform methods are 
provided by use of the Laplace transform, for which it is more natural to introduce bound- 
ary data. See the later sections of this chapter. 


Exercises 
20.3.1 Write the 1-D equivalents of the equations for translation, scale change, sign change, 
and complex conjugation that were given for 3-D transforms in Eqs. (20.48) to (20.51). 
20.3.2 (a) Show that by replacement of r by r — R in the formula for the Fourier transform 
of f(r), one can derive the translation formula, Eq. (20.48). 
(b) Using methods similar to those for part (a), establish the formulas for scale change, 
sign change, and complex conjugation, Eqs. (20.49) to (20.51). 
20.3.3 Derive Eq. (20.53), the formula for the Fourier transform of v? fr). 
20.3.4 Verify Eqs. (20.55) and (20.56), the formulas for the derivatives of 1-D Fourier trans- 
forms. 
20.3.5 Derive the inverse of Eq. (20.56), namely that 
T eopode 
[FO] =i" FT 8). 
20.3.6 The 1-D neutron diffusion equation with a (plane) source is 


20.4 


d* (x) 


<3 + K*De(x) = Q(x), 


—D 





where v(x) is the neutron flux, Q 5(x) is the (plane) source at x = 0, and D and K? are 
constants. Apply a Fourier transform. Solve the equation in transform space. Transform 
your solution back into x-space. 


Q _irx 
ANS. — ——_¢ |Fal, 
p(x) 5KD" 


FOURIER CONVOLUTION THEOREM 


An important relationship satisfied by Fourier transforms is that known as the convolu- 
tion theorem. As we shall soon see, this theorem is useful in the solution of differential 
equations, in establishing the normalization of momentum wave functions, in the evalua- 
tion of integrals arising in many branches of physics, and in a variety of signal-processing 
applications. 
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We define the convolution of two functions f(x) and g(x), understood here to be over 
the interval (—oo, 00), as the following operation designated f * g: 


CO 
1 
* = — — y)dy. 20.68 
(f * g)(x) an g(y) f(x — y)dy (20.68) 
—0o 
The corresponding definition in three dimensions is 
1 3 
(f *g)@)= Orie / er) fa-r)d-r'’, (20.69) 


where the integral is over the full 3-D space. 

This operation is sometimes referred to as Faltung, the German term for “folding.” 
To better understand the origin of this name, look at Fig. 20.7, where we have plotted 
f(y) =e” and f(x — y) =e7@-”. Clearly, f(y) and f(x — y) are related by reflection 
relative to the vertical line y = x/2; that is, we could generate f(x — y) by folding over 
f() on the line y= x/2. 

Our interest here is not primarily in the nomenclature, but rather to understand what hap- 
pens if we take the Fourier transform of a convolution. Letting F(t) and G(t), respectively, 
be the Fourier transforms of f and g, we find 


i? 1 oe¢ es 
(Fee @= = f ax se | avert» ee 
i 7 ity a it(x—-y) 
= Vin dy g(y)e’ ie dx f(x — y)e 
1 ‘i ity 1 i itz 
= se | avevre on / dz f (ze 
= G(t)F(t). (20.70) 











FiGURE 20.7. Factors in a Faltung. 
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In the second line of the above equation set we simply divided e‘’* into the two factors e!"” 


and e!!*—Y); the third line was reached by changing the integration variable of the second 
integral from x to z= x — y. After this change, y only appears in the first set of square 
brackets and z only appears in the second bracket set. We are then able to continue to the 
fourth line where we identify the integrals as Fourier transforms. 

We often encounter integrals that have the form of a convolution f * g. The convolution 
theorem then enables the construction of the Fourier transform of the integral, and the 
integral itself will then be given by taking the inverse transform of (f * g)’. This process 
corresponds to 


‘| e(y) f(x — y)dy = V20(f * g)(x) = Vase | Cf «g)? (New dt 
= / GF (Ne dt. (20.71) 


Once again we see an appealing feature inherent to Fourier analysis. While the two func- 
tions in our original integral, g(y) and f(x — y), had different arguments, their transforms, 
G(t) and F(t), have the same argument. We still have an integral to evaluate after using 
the convolution theorem, but (as just observed) the integrand consists of a product of quan- 
tities both of which are evaluated at the same point. The cost of the transformation is the 
presence of a complex exponential, which imparts oscillatory character to the integral. We 
have thus traded geometric complexity for oscillational complexity. Often this will be an 
advantageous trade-off. 
For the record, here is the 3-D equivalent of Eq. (20.71): 


/ gir) f(r—r)d?r' = / F(k)G(k)e"*T dk. (20.72) 


Parseval Relation 


If we specialize Eq. (20.71) to x = 0, we get the relatively simple result 
[o.@) CO 
/ f(-y)g(y)dy = / F(t)G(t)dt. (20.73) 
266 a 


This equation becomes more easily interpreted if we change f(y) to f*(—y). Then we 
must replace f(—y) in Eq. (20.73) by f*(y), while F(t) becomes [ f*(—y)]", which, 
invoking Eq. (20.51), can be written F*(t). With these changes, we have 


/ f*(y)g(y)dy = / F*(t)G(t)dt. (20.74) 


This equation is known as the Parseval relation; some authors prefer to call it Rayleigh’s 
theorem. 
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The integrals in Eq. (20.74) are of the form of scalar products, and will exist if f and g 
(and therefore also F and G) are quadratically integrable (i.e., members of an L? space). 
Letting F denote the Fourier transform operator, we can rewrite Eq. (20.74) in the compact 
form 


(fg) =(F flF8). (20.75) 


If we now move the F out of the left half-bracket, writing instead its adjoint in the right 
half-bracket, we reach 


(fle) = (fIF'Fe). (20.76) 


Since this equation must hold for all f and g in our Hilbert space, it is necessary that F' F 
reduce to the identity operator, meaning that 


Fi=F'. (20.77) 


Our conclusion is that the Fourier transform operator is unitary. 
If, next, we consider the special case g = f, Eq. (20.75) takes the form 


(FIP) = (FIF), (20.78) 


showing that f and its transform, F’, have the same norm, a result that is hardly surprising 
since we already know that transforming f twice brings us back to at worst f multiplied 
by a complex phase factor. 

An interesting consequence of the unitarity property is illustrated by the formulas gov- 
erning Fraunhofer diffraction optics. The amplitude of the diffraction pattern appears as 
the Fourier transform of the function describing the aperture (compare Exercise 20.4.3). 
With intensity proportional to the square of the amplitude, the Parseval relation implies 
that the energy passing through the aperture (the integral of | f|7) is equal to that in the 
diffraction pattern, whose total energy is the integral of |F|?. In this problem the Parseval 
relation corresponds to energy conservation. 

We close this topic with two observations. First, note how the clarity and simplicity of 
our discussion of the Parseval relation was greatly enhanced by introducing appropriate 
notation. Much of our insight and intuition regarding mathematical concepts flows directly 
from the use of good notations for their description. Secondly, we call attention to the fact 
that Parseval’s relation can be developed independently of the inverse Fourier transform 
and then used rigorously to derive the inverse transform. Details can be found in the text 
by Morse and Feshbach (Additional Readings). 

Here are some examples illustrating use of the convolution theorem. 


Example 20.4.1 POTENTIAL OF CHARGE DISTRIBUTION 


We require the potential at all points r produced by a charge distribution p(r’). From 
Coulomb’s law, or equivalently from the Green’s function for Poisson’s equation, we have 
1 p(r’) 3 


vr) = — | ——d’r. (20.79) 
4x J |r—r’'| 
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The integral for w is of the convolution form, and its presence in this problem suggests 
that convolutions will arise in a wide variety of problems in which there is a distributed 
source of almost any kind and an effect therefrom that depends on relative position. 

Taking f(r) = 1/r, so that f(r —r’) = 1/|r —r’|, and g(r) = e(r), application of the 
convolution formula Eq. (20.72) yields 


io=— / fT (Wg! (Kye Kak. 
Aa 





Since 
1 4 
f= aaa Gand 87 = 0Tb), 
we have 
_ 1 p'(k) ~ik-r 73 


Depending on the functional form of p, Eq. (20.80) may or may not be easier to evaluate 
than the original equation for w, Eq. (20.79). a 


Example 20.4.2 —Two-CENTER OVERLAP INTEGRAL 


In quantum mechanics problems involving molecules, one often encounters the so-called 
overlap integral, which is the scalar product of two atomic orbitals, one, ga, centered at 
a point A, and another, gp, centered at a different point B. This overlap integral, denoted 
Sab, can be written 


c= i gx (r — A)gp(r — B)d?r. (20.81) 


The integral is over the full 3-D space. One way to evaluate Sz, starts by changing to 
coordinates in which the origin is at A; this amounts to the substitution r’ = r — A, in 
terms of which r— B=r’ — (B— A), so 


Sab = i ge (r)yp(r' — R)d*r’, 


where R = B— A. We note the physically expected feature that the value of Sy» does not 
depend on A and B separately but only on the vector R describing their relative position. 
This integral for Sap is almost in the standard form for a convolution (it differs there- 
from by having r’ — R instead of R — r’). This discrepancy can be handled by invoking 
Eq. (20.50); the net effect is to change the sign of the transform variable k when we eval- 
uate op : 
Again using Eq. (20.72), we write 


ae / [ #3] ‘Kot (—k)e HER 3x, 
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We continue with the specific case that gz and gp are Slater-type orbitals (STOs), both 
with the same screening parameter ¢. These STOs, and their Fourier transforms (which 
can be obtained by differentiating Eq. (20.41) with respect to its parameter a), are 


= 1 8x 
pee ES Cay E+E? 





Inserting the formula for yg’ into the integral for S,,, we get 


(8 X zy? eT tk R ae 
eb = “Oaey le ery" 
At this point we already see an advantage of the convolution-based procedure. This integral 
(whether or not we can easily evaluate it) has assumed a single-center character, with the 
interorbital spacing relegated to the complex exponential factor. 

To complete the evaluation, we now insert the spherical wave expansion for 
exp(—ik- R), Eq. (16.61), and we note the further simplification that the only term sur- 
viving the integration over the angular coordinates of k is the / = 0 term of the expansion. 
Keeping in mind that ¥? = 1/./4zr, that term is seen to be just jo(kR), so our formula for 
Sap becomes 








i= Ask? dk. 


(81¢)? i jo(kR) 
Qn | G2 4024 
0 


We now have a known I-D integral, which in fact we encountered in Exercise 14.7.10: 


Co 
2"t2 (n +1)! k? jo(kx) 


Kn (x) = = yal (2 - Wa 
0 





Changing x in this formula to ¢ R and replacing k by k/f, we reach 





3 —tR 

ei 7 bn(CR) = = (c?R?+3¢R+3). (20.82) 
Note that when we insert the explicit form for k2, we obtain a relatively simple final result. 
There are other ways to obtain this formula (one of which is to use prolate ellip- 
soidal coordinates with A and B as foci), but the method we have chosen here provides 
a good illustration of the issues and formulas that arise when the convolution method is 
applicable. | 


Multiple Convolutions 


Some important problems take the form of multiple convolutions, which we illustrate in 
one dimension by the convolution of a function h first with a function g followed by the 
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convolution of that result with f, ie., f * (g *h). Thus, 


[fem] = za! dyf(y)(g* h(x —y) 
1 CO [o,@) 
= / dy / at f(y) g(t)h( —y—1), 
ua 
which after making the substitution t = z — y (and therefore x — y — t = x — z) becomes 
1 [o,@) [o,@) 
LF * (g * n)| (x)= = / dy / dz f(y) g(z—y)h(« — z). (20.83) 


Letting F, G, and H be the Fourier transforms of f, g, and h, this case of the convolution 
theorem is 


[fxexn| Par Ocwae: (20.84) 


We have now omitted the parentheses surrounding g * h since we would have gotten the 
same result if we convoluted f, g, and h in any order. Then, taking the inverse transform, 


we have 
a fa: fodee—hee=2)= Om"? | Fe\Gw)H ede, (20.85) 


In three dimensions, the corresponding formulas are 
Ls * (gx h)| (r) = _— / Br’ far" fee) g(r’ —r)h(r—r’), (20.86) 
(27) 
T 
[ F< (e | & = FANG) AWW), (20.87) 


far [erry ge" ry her = Qn)? [ FayGaqH age" ak. 
(20.88) 


Example 20.4.3 INTERACTION OF TWO CHARGE DISTRIBUTIONS 


The electrostatic interaction of two charge distributions o;(r) and p2(r) is given by the 
integral 


p(t’) p20") 
va far far" age (20.89) 


which is a double convolution, as in Eq. (20.88), but with the free argument r set to zero 
and with a sign discrepancy in the argument of h (which is 2 of the present example). 
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Taking the above into account and applying Eq. (20.88), we have 


T 
V = (2)? / d°k p{(k) = | (k)p3 (—k) 


ike ie 
=4n / =P (k) p3 (—k), (20.90) 


where we have inserted the value of (1/r)! from Eq. (20.42). This expression has the 
obvious advantage that it is a 3-D integral in place of the original six-fold integration in 
Eq. (20.89). The price we have to pay for this simplification is the cost of taking the Fourier 
transforms of p; and (>. a 


Transform of a Product 


The similarity between the formulas for the direct and inverse Fourier transforms sug- 
gest that we may be able to identify the Fourier transform of a product as a convolution. 
Accordingly, we rewrite Eq. (20.71) with x replaced by —x and also change the variable 
of integration in that equation from y to —y. We then have (multiplying the equation by 


1//2r) 





‘ T 
g(-y) f(y — x)dy = GW) Fe dt =[ GH) FO] @). 2091 


‘| 7 
If we now make the further identifications 
[Gm] o=ecy and [FO] @-y)=F0-0. 
we have 
(FT #GT)x) = [ow Fo| OS, (20.92) 


Rewriting Eq. (20.92) with the functions renamed f and g and their respective transforms 
denoted F and G, we have our desired final result: 


EB s| = G. (20.93) 


Equation (20.93) will be useful only if f and g individually have Fourier transforms. 
It is possible that this condition is not satisfied despite the fact that fg possesses a trans- 
form. We therefore proceed to consider the case that f not have a transform, but instead 
possesses a Maclaurin expansion, and therefore can be represented by a series in positive 
integer powers of x. Then, starting from the relation 


n ss -—n da” 
[x"s] O=r"— co. 
the topic of Exercise 20.3.5, we can write 


r d 
[fs] =F (-: =) GO), (20.94) 





20.4 Fourier Convolution Theorem 993 


where the expression —i(d/dt) is the argument of f (and not a multiplicative factor). 
Unless f is quite simple, this expression may be of limited practical value. 


Momentum Space 


Hamilton’s equations of classical mechanics formalize a symmetry between position vari- 
ables gq and the corresponding (conjugate) momentum variables p. This same corre- 
spondence carries over into quantum mechanics, where (in one dimension, in units with 
h = 1), the fundamental relationship is the commutator [x, p] = i. The time-independent 
Schrédinger equation (for a particle of mass m) is 


Hy= Fea +V(x)|W=Ev, 


and it is usually made more explicit by taking p = —i(d/dx), in which case the wave 
function yw is a function of x: y% = w(x). In principle we could have chosen p as the 
fundamental variable, in which case the proper value of the commutator is recovered if 
we take x = +i(d/dp), and w (which we will now give the name ¢) will be a function 
of p: g = v(p). These two representations of the Schrédinger equation in one dimension 
correspond, respectively, to the two ODEs: 


d 
“Sa Ge OU) = Br), (20.95) 


Pp d 
ae (=) p(p) = Eg(p). (20.96) 
m dp 


Note that in the second of these two equations, the argument of V is a differential operator, 
and unless the form of V is relatively simple, the momentum-space ODE will be quite 
complicated and correspondingly difficult to solve. 

In the coordinate representation (x, —id/dx), a wave function exp(ikx) is an eigen- 
function of momentum with eigenvalue k: 


ikx a ikx a ikx ikx 
pe =-i Tx e’* =—i(ikje“ =ke™, 
and this fact suggests that momentum wave functions will be Fourier transforms of their 
coordinate counterparts. We therefore seek to verify the consistency of Eqs. (20.95) and 
(20.96) by Fourier transforming the first of these two equations, letting g(t) represent the 
transform of y and using Eq. (20.56) to take the transform of the second derivative.’ In 
the case that V has a Maclaurin expansion, we then use Eq. (20.94), obtaining 


tr d 
mo +V (5) g(t) = Eg(t). 


This equation can be brought into agreement with Eq. (20.96) if we take its complex con- 
jugate (assuming V to be real), so we can make the identification p(p) <—> g*(t). 


3Here t is the transform variable; in the present context it has nothing to do with time. 
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On the other hand, if V has a transform we can use the convolution formula, Eq. (20.93), 


thereby converting Eq. (20.95) into an integral equation: 


F 9(p)+ a V7 (p — p')o(p')dp' = Eg(p). (20.97) 


Example 20.4.4 — MoMmeENTUM-SPACE SCHRODINGER EQUATION 


The time-independent Schrédinger equation for the hydrogen atom has (in hartree atomic 
units i = m = e = 1) the coordinate representation 


i 235 1 
“5 Vv) — -¥@=EYV@). 
r 


Taking the Fourier transform of this equation, we get for the momentum-space wave 
function g(k) 
k2 
2 





k : 4m ok) 2K’ = Eg(k) 20.98 
p(k) oo | wow? = E 9(k). (20.98) 


In reaching Eq. (20.98), we have used the 3-D version of Eq. (20.97), inserting for the 
transform of V the result from Eq. (20.42). 

In principle one can solve Eq. (20.98) for y(k) and the corresponding eigenvalues E, 
and the results should be equivalent to the original equation. That is a more difficult task 
than we will undertake now, but it is straightforward to verify that the Fourier transform of 
the known solution for the hydrogen ground state is a solution to Eq. (20.98). 

From Eq. (20.44), the hydrogen 1s wave function e~” is seen to have Fourier transform 


Cc 
09 ee 


where C is independent of k and has a value that is irrelevant here. Inserting this result into 
Eq. (20.98), we find 





1 CR C ak’ C 
20.99 
/ |k—k’ ( ) 


=E 
2412 2x? 22 +12 (ke +1)? 


Writing |k — k’|? = k* + 2kk' cos@ + k”, the integral, though a bit tedious, is found to be 
elementary. Inserting its value, Eq. (20.99) becomes (canceling the common factor C), 


lo A fs gee! 
2(k2 +1)? 2k4+1  — (k241)2' 





This equation is satisfied if EF = —1/2, the correct energy (in hartree atomic units) for the 
hydrogen Is state. a 





Exercises 


20.4.1 


20.4.2 


20.4.3 
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Work out the convolution equation corresponding to Eq. (20.71) for 
(a) Fourier sine transforms 


oe) 


00 
5 | sof ro+9+ro-o]ay= f 9G, (s)e0ssx a 
0 0 
where f and g are odd functions. 
(b) Fourier cosine transforms 
00 oo 
5 [ of ro+ntsre-yay= f REG (,ossxas, 
0 0 


where f and g are even functions. 


Show that for both Fourier sine and Fourier cosine transforms Parseval’s relation has 
the form 


i F()G(@)dt = / fOe(dy. 
0 0 


(a) A rectangular pulse is described by 


1, |x| <a, 


pal 


0, |x|>a. 


Show that the Fourier exponential transform is 


2 sinat 
F(t)=,/—- : 
A t 


This is the single-slit diffraction problem of physical optics. The slit is described 
by f(x). The diffraction pattern amplitude is given by the Fourier transform F(t). 





(b) Use the Parseval relation to evaluate 


Co 
sin’t 
t? : 


—oo 





This integral may also be evaluated by using the calculus of residues 
(Exercise 11.8.9). 


ANS. (b) z. 
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20.4.4 


20.4.5 


20.4.6 


20.4.7 


20.4.8 


Solve Poisson’s equation, V7w(r) = —p(r)/e0, by the following sequence of opera- 
tions: 


(a) Take the Fourier transform of both sides of this equation. Solve for the Fourier 
transform of y(r). 


(b) Carry out the Fourier inverse transform. 


(a) Given f(x) =1—|x/2| for —2 <x <2, with f(x) =0 elsewhere, show that the 


Fourier transform of f(x) is 
2 /sint\? 
F(t)=,/—[{—] . 
a t 


(b) Using the Parseval relation, evaluate 


ee) 


j cya 


—-C 
2 
ANS. (b) oe 


With F(t) and G(¢) the Fourier transforms of f(x) and g(x), respectively, show that 
ee) 5 0° : 
/ | Fe) = s@)| c= / | F(t)— Gw| dt. 
x: —0Co 


If g(x) is an approximation to f(x), the preceding relation indicates that the mean 
square deviation in f-space is equal to the mean square deviation in x-space. 


Use the Parseval relation ee evaluate 


ofa » | ss 
, wade" wane 


Hint. Compare Exercise 20.2.3. 


ue ue 


The nuclear form factor F(k) and the charge distribution o(r) are 3-D Fourier trans- 
forms of each other: 


k= 1 ik-r 73 
FW = Gp | ee dvr. 


If the measured form factor is 


K2\" 
F(k) = (22)73/”” (1 a =) , 
a 
find the corresponding charge distribution. 


ANS. p(r) = — 








20.4.9 


20.4.10 


20.4.11 
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Using convolution methods, find an integral whose value is the electrostatic interaction 
energy between a charge distribution o(r — A) and a unit point charge at C. 


With w(r) a wave function in ordinary space and y(p) the corresponding momentum 
function, show that 


1 ’ 
@ Gp 7. rp(nje "TP!" d?r = iV, ol), 


(b) i Py(neTP" Br = (GhV»)Po(p). 


1 
(27 h)3/2 


Note. V p is the gradient in momentum space: 


é é +e : +e : 
“apy Opy “Ops 
These results may be extended to any positive integer power of r and therefore to any 
(analytic) function that may be expanded as a Maclaurin series in r. 





The ordinary space wave function w(r, 7?) satisfies the time-dependent Schrédinger 

equation, 

awr,t) hh? 
ot sim 


Show that the corresponding time-dependent momentum wave function satisfies the 
analogous equation 


in Vow tV(nyy. 





dp(p.t) pp? 
=! 94 VUiKV,)9. 
at Soa + Vi pe 


Note. Assume that V(r) may be expressed by a Maclaurin series and use Exer- 
cise 20.4.10. V@hV p) is the same function of the variable ii V, that V(r) is of the 
variable r. 


ih 





20.5 SIGNAL-PROCESSING APPLICATIONS 


A time-dependent electrical pulse f(t) may be regarded as a superposition of waves of 
many frequencies. For angular frequency w, we have a contribution 


F(a)ei™. 


Then the complete pulse may be written as 


(oe) 


fO=— / F(a)e'da. (20.100) 


—oo 


Because the angular frequency a is related to the linear frequency v by 


a) 
p=; 
20 
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most physicists associate the entire 1/27 factor with this integral, so this formula differs 
by a factor (277) 1/2 from the definition we have adopted for the Fourier transform. 

But if w is a frequency, what about the negative frequencies? The negative w may be 
looked on as a mathematical device to avoid dealing with two functions (cos wt and sin wt) 
separately. 

Because Eq. (20.100) has the form of a Fourier transform, we may solve for F(@) by 
taking the inverse transform. Keeping in mind the scale at which we wrote Eq. (20.100), 
we get 


F(w)= / fe at. (20.101) 


Equation (20.101) represents a resolution of the pulse f(t) into its angular frequency 
components. Equation (20.100) is a synthesis of the pulse from its components. 

Now consider some device, such as a servomechanism or a stereo amplifier, with an 
input f(t) and an output g(t). For an input of a single frequency f,, with input f(t) = 
F(w)e'®', the device will alter the amplitude and may also change the phase. For the 
situations we discuss here, we assume a linear response, which means that we are assuming 
that g,, (the output corresponding to f,,) will be a signal at the same frequency as f,,, will 
scale linearly with f,,, and be independent of the simultaneous presence of signals at other 
frequencies. However, the responses of interesting devices will depend on the frequency. 
Hence, our assumption is that g,, and f,, are related by an equation of the form 


Salt) = 9) fold). (20.102) 


This amplitude- and phase-modifying function, y(w), is called a transfer function. When 
making schematic diagrams of electronic circuits, it is customary to designate a device 
characterized by a transfer function by a suitably labeled box with input and output con- 
ductors, as shown in (Fig. 20.8). 

Because we have assumed the operation corresponding to the transfer function to be 
linear, the total output from a pulse containing many frequencies may be obtained by inte- 
grating over the entire input, as modified by the transfer function, 


[o,@) 
1 ; 
1 a i g(o) Fedo. (20.103) 
Iv 
—0o 
The transfer function is characteristic of the device to which it applies. Once it is 


known (either by calculation or measurement), the output g(t) can be calculated for any 
input f(t). 





FiGuRE 20.8 Schematic for device described by transfer function. 
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Equation (20.103) can be brought to a convenient form if we recognize that it is simply 
the formula for the Fourier transform of the product g(w)F(w). We already know that 
F(q@) has transform f(t). Letting ®(t) be (at the scaling of this section) the transform 
of g(@), we may then use Eq. (20.93) to rewrite Eq. (20.103) as the convolution of the 
transforms f and ®: 


go= / FOr — tar’. (20.104) 


—0o 


Interpreting Eq. (20.104), we have an input (a “cause”), namely f(t’), modified by 
®(t — t’), producing an output (an “effect”), namely g(t). Adopting the concept of causal- 
ity (that the cause precedes the effect), we must obtain contributions to g(t) only from 
times ¢’ such that t’ < t. We do this by requiring 


O(t—t')=0, ¢ >t. (20.105) 


Then Eq. (20.104) becomes 
t 
g(t)= - f(t y(t — tat’. (20.106) 
—0o 


Since Eq. (20.106) must yield real output g(t) for arbitrary real input f(t), we see that in 
addition to the requirement in Eq. (20.105), we also know that ®(r) must be real. 

The adoption of Eq. (20.106) and the reality of ® have profound consequences here and 
equivalently in dispersion theory (Section 12.8). 


Example 20.5.1 — TRANSFER FUNCTION: HIGH-Pass FILTER 


A high-pass filter permits almost complete transmission of high-frequency electrical sig- 
nals but strongly attenuates those at lower frequencies. A very simple high-pass filter is 
shown in Fig. 20.9. Its transfer function describes the steady-state behavior of the filter in 
the absence of loading (meaning that the output terminals are not connected to anything), 
so we can assume that, for a signal at frequency w, the input, output, and current are the 
real parts of the respective quantities Vjne'@’, Voute!@’, Te'®. Possible phase differences in 
these quantities are allowed for by permitting Vin, Vout, and J to be complex. 








FiGuRE 20.9 Simple high-pass filter. 
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Following the usual procedure for electrical circuit analysis, we solve Kirchhoff’s equa- 
tion (the condition that the net change in potential around any loop of the circuit vanishes): 


t 


; rT ; 
Vine = / ce ate Re (20.107) 


Differentiating with respect to ¢ (to eliminate the integral), we have 
d , I, d 
V: — giat = — givt +RI— el, 
” dt C dt 


which, evaluating the derivatives, reduces to 


I iwC V; 
ioVin = +i@RI, with solution 1 = ee (20.108) 
Since Vout = JR, we easily find the transfer function 
V. ioRC 
go) == (20.109) 





Vin 1+ti@RC’ 


To confirm the behavior of the filter, note that in the limit of large w, g(w) — 1, while at 
small w, @(@) — iwRC, which vanishes in the limit of small w. The transition between 
these two limiting behaviors is a function of the product RC. a 


Limitations on Transfer Functions 
Let us write the transfer function y(w) as the inverse Fourier transform of ®(f) (still using 
the scaling of this section), keeping in mind that ®(t) vanishes for t < 0, 


CO 


y(@) = / O(t)e dt. (20.110) 
0 


Now, separating ¢ into its real and imaginary parts: g(w) = u(w) +iv(@), and making the 
same separation for the right-hand side of Eq. (20.110), we have 


CO 


u(o) = f ®@eoser dr 
0 
_ (20.111) 
vw) =~ f e@sinor dr, 
0 


These formulas tell us that u(w) is even, and that v(w) is odd. 
Since Eqs. (20.111) are cosine and sine transforms, they can be inverted to give two 
alternative formulas for ®(f) in the range of applicability of these transforms, namely for 
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t > 0. Continuing to use the transform scaling of this section, 


CO 
2 
O(t)=— / u(w)cos wt dw, 
Tw 
0 
(t > 0) (20.112) 
CO 
2 ; 
= == f v@sinorde, 
cd 
0 
The present significance of these results is that 
Co CO 
7 u(@)cos wt dw = -| v(@)sinwtdw, (t>0). (20.113) 
0 0 


The imposition of causality has led to a mutual interdependence of the real and imagi- 
nary parts of the transfer function. The present result is similar to those involving causality 
that were discussed in Section 12.8. 

We close this subsection by verifying that the conditions on u and v are consistent with 
the properties required of ®. Writing 


[ee 


on = / v(w)e!™ dt, 


—oo 


then inserting e'! = coswt +i sinwt and y = u + iv, we have 
g y 


®(t) = x / [ w(c)cos ext — v(w)sin wt dw 
ees / [ ww)siner + v(w)eos et] do. (20.114) 
20 


The imaginary part of Eq. (20.114) vanishes because its integrand is an odd function of w. 
If t > 0, we know from Eq. (20.113) that the two terms of the real part of Eq. (20.114) are 
equal, and we get the expected nonzero result. But if t < 0, the sign of the second term of 
the real part is changed and they then add to zero. 


Exercises 
20.5.1 Find the transfer function g(@) for the circuit shown in the left panel of Fig. 20.10. Is 
this a high-pass, a low-pass, or a more complicated filter? 
20.5.2 Find the transfer function g() for the circuit shown in the right panel of Fig. 20.11. 


Hint. The potential difference across an inductor is given by LdI/dt. 
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FiGURE 20.11 Circuit for Exercise 20.5.3. 











(Fig. 20.10) 


(Fig. 20.9) oe 























FIGURE 20.12 Representation of the circuit in Fig. 20.11 in terms of successive transfer 
functions. 


20.5.3 Find the transfer function g(@) for the circuit shown in Fig. 20.11. This is a band-pass 
filter. 


Hint. Assume the currents in the various parts of the circuit to have the values shown in 
the figure. 


20.5.4 = The circuit elements for Exercise 20.5.3 correspond to the successive transfer functions 
shown in Fig. 20.12. Explain why the transfer function for this exercise is only the 
product of the individual transfer functions in the limit R2 >> R}. 


20.6 DISCRETE FOURIER TRANSFORM 


For many physicists the Fourier transform is automatically the continuous Fourier trans- 
form whose analytical properties we have been discussing in previous sections of this 
chapter. The use of digital computers, however, presents an opportunity to work with 
numerically determined Fourier transforms, which consist of values given at a discrete set 
of points. Integrations are therefore converted into finite summations. Transforms defined 
on discrete point sets have properties worth pursuing, and analysis in that area is the topic 
of this section. 
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Orthogonality on Discrete Point Sets 


Throughout the earlier chapters of this book we have introduced and made use of the prop- 
erties of orthogonal functions, where orthogonality has been defined as the vanishing of an 
integral whose integrand contains a product of the functions under study. The alternative, 
to be discussed here, is to define orthogonality over a discrete point set as the vanishing of 
a sum of products computed at the individual points. It turns out that sines, cosines, and 
imaginary exponentials have the remarkable property that they are also orthogonal over a 
series of discrete, equally spaced points on an orthogonality interval. 

To analyze this situation, we take a set of N equally spaced points x,, on the interval 
(0, 270): 


20k 
=P k=0, 1, 2,..., N-1, (20.115) 


and we consider functions g, (x), defined only on the points x, and for integer p, as 
@p(x) = elP*, (20.116) 


In line with our introductory discussion, we define the scalar products of these functions as 


N-1 


(Pp lq) = >, Gs (Xe) Gq (Xk). (20.117) 
k=0 


Inserting Eq. (20.115) for the x;, the scalar product takes the form 


N-1 


(Gp lq) = y PRE ENN ee NC a (20.118) 
k=0 


where r = e27!4-P)/N | This is a finite geometric series; if r = 1 its sum has the value N; 
otherwise the sum evaluates to (1 —r%)/(1 —r). But rY = e?*'4-P), and because p and 
q wete restricted to integer values, we have r\ = 1, so the sum vanishes. To complete our 
understanding of the situation, we need to determine the conditions under which r = 1. We 
clearly have r = 1 when q = p. Note that we also have r = 1 when q — p is any integer 
multiple of V. Thus, a formal statement relative to this scalar product is 


(Pp l@q) =N 3 dg—p.nN- (20.119) 


n=—CO 


Note that at most only one of the infinite sum of Kronecker deltas will be nonzero, and all 
will be zero unless g — p is a multiple of N (one of which is g — p = 0). 

Equation (20.119) is more complicated than necessary. Because the functions gp are 
defined by their values at N points, only N of them are linearly independent. In fact, 


)= ect (PTN)K/N = e2tipk/N 


Op+n (Xk = Pp (xx). 
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We can therefore restrict p and q in Eq. (20.119) to the range (0, N — 1), and our orthog- 
onality relation then becomes 


(Ppl@q) =N5pq, OS P.9SN-1. (20.120) 


Obviously, if function values on a discrete point set are to be used to represent a contin- 
uous function, the amount of detail that is retained in the analysis will depend on the size 
of the point set. We will come back to this issue in a later part of the present section. 


Discrete Fourier Transform 


By analogy with the definitions introduced for the conventional Fourier transform, we 
define the discrete transform g,) (p =0,...,.N — 1) of a function f defined only on the 
points x; by the formula 


N-1 
B= n-12 > e2mikp/N py (20.121) 
k=0 


We are now writing f, as a shorthand for f (xx), and that substitution has pretty much 
decoupled the problem from the original interval of definition 0 < x < 27. In essence, we 
are now discussing transformations between two N-member sets of function values. 

The transformation inverse to Eq. (20.121) is 


N-1 
fp eno? be PIN gy: (20.122) 
p=0 


Eq. (20.122) can be verified by substituting into it the formula for g,, yielding 


N-1N-1 N-1 
-1 2ni(k—j)p/N 
fi=N > ye J)p/ are 
p=0 k=0 k=0 


as required. 

These discrete transforms have properties similar to those of their continuous cousins. 
For example, the transform of f,_j;, where j is an integer, corresponding to translation by 
j steps in the f array, is 


N-1 N-1 
el, _ N71 = Cai ae ee = 2M ip/N N72 Ps aaa 8 are 
k=0 k=0 


Because of the periodicity of the f;, we note that 


= N-1 

ag: +h 
) eotik p/N fe’ = N7/2 5 eomik p/N fis 
— k'=0 


N7'/2 2 eR DPIN fs = N71/2 


N-1 N 
k=0 k 


j-l 
=o] 
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which is the formula for the p coefficient in the transform of f. We therefore have the 
translation formula 


iil oe ee (20.123) 


We examine next the convolution theorem, where the discrete convolution of two point 
sets f and g is defined as 


N-1 
(feslkeaN > fiaey. (20.124) 
j=0 


Taking the transform of this convolution, we have 


N-1 N-1 
—1 2nikp/N 
ee a Dei 
k=0 j=0 


N-1 N-1 
n-1/2 ys ePttiPiN Ff Nn71/2 s- EP E—DPIN gy 
imo k=0 


As in the continuous case, we have split the complex exponential into two factors. We now 
redefine the index of the second summation from k to / =k — j, thereby making the two 
square brackets completely independent. Each can then be recognized as a transform (for 
the second, we need to use the fact that the g; are periodic). The final result is 


[feel pC pi (20.125) 


where F and G are the respective discrete transforms of f and g. This result is completely 
analogous with the convolution theorem for the continuous transform. 

We close this discussion with the observation that the discrete transform and its inverse 
are linear transformations on coefficient arrays (vectors) of finite dimension NV. Therefore, 
each transform operator can be represented as an N x N matrix whose rows and columns 
correspond to the points k or p. The fact that the transform and its inverse are complex 
conjugates means that the transformation matrices are unitary. Moreover, from the forms of 
the transform and its inverse, we see that all the elements of these matrices are proportional 
to complex exponentials. 


Limitations 


As mentioned earlier, the ability of discrete transforms to reproduce phenomena that are 
actually based on continuous functions will depend on the size of the point set in use. A 
large amount of detail on errors and limitations in the use of the discrete Fourier transform 
is provided by Hamming (see Additional Readings). We illustrate the potential problems 
in the following example. 
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Example 20.6.1 DISCRETE FOURIER TRANSFORM: ALIASING 


Let’s consider the simple case f(x) = cos3x on the interval 0 < x < 27, which we 
(ill-advisedly) attempt to treat by the discrete Fourier transform method with N = 4. Our 
four points are at x = 0, w/2, mw, and 37/2, and the four corresponding values of f; 
are (1, 0, —1, 0). The problem is that these same four values would be produced from 
g(x) =cosx, so neither our discrete transform nor any information derived therefrom can 
properly reflect any difference in behavior between f(x) and g(x). If all that we are given 
are the four values (1, 0, —1, 0), the most straightforward thing to do is take the discrete 
transform, yielding (0, 1,0, 1), which (from the formula for the inverse transform) corre- 
sponds to 


eitx/2 ae e3inx/2 


2 





1 
~(0, 1,0, 1 
5 i 


If evaluated only at the chosen points, this expression is correct, but if used as an approx- 
imation over the continuous range (0, 277) it cannot distinguish between cos x, cos 3x, or 
any linear combination of the two with unit overall weight. 

Situations in which the behavior at one wavelength or frequency is mistaken for that 
at another is called aliasing. The best way to avoid aliasing errors is to use point sets 
of sufficient size to accommodate the expected extent of oscillatory character in our 
problem. a 


Fast Fourier Transform 


The fast Fourier transform (FFT) is a particular way of factoring and rearranging the terms 
in the sums of the discrete Fourier transform. Brought to the attention of the scientific com- 
munity by Cooley and Tukey,’ its importance lies in the drastic reduction in the number of 
numerical operations required. The reduction is possible because the transformation matrix 
contains large numbers of duplicate entries, and the FFT procedure organizes the compu- 
tation in a way permitting identical sets of coefficients to be computed only once. Because 
of the tremendous increase in speed achieved (and reduction in cost), the fast Fourier trans- 
form has been hailed as one of the few really significant advances in numerical analysis in 
the past few decades. 

For N data points, a direct calculation of a discrete Fourier transform would require 
about N* multiplications. For N a power of 2, the fast Fourier transform technique of 
Cooley and Tukey cuts the number of multiplications required to (N/2) log, N. If N = 
1024 (2!°), the fast Fourier transform achieves a computational reduction by a factor of 
over 200. This is why the fast Fourier transform is called fast and why it has revolutionized 





4yW. Cooley and J. W. Tukey, Math. Comput. 19: 297 (1965). 
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the digital processing of waveforms. Details on its internal operation will be found in the 
paper by Cooley and Tukey and in other sources.° 


Exercises 


20.6.1. Derive the trigonometric forms of discrete orthogonality corresponding to Eq. (20.120): 


N-1 

© cos(2m pk/N)sin(2gk/N) = 0 

k=0 
N-1 0, P#4 

> cos(2m pk/N)cos(2xgk/N) =} N/2, p=q#0,N/2 
= N, p=q=0,N/2 
N-1 0, P#4 

> sin(2 pk/N)sin(2xrgk/N) = N/2, p=q#40,N/2 
i 0, p=q=0,N/2. 


Note. If N is odd, p and q will never have the value N’/2. 


Hint. Consider the use of trigonometric identities such as 
‘ lr. . 
sin A cos B = Al sin(A + B) + sin(A — B)| : 


20.6.2 Show in detail how to go from 


1 N-1 l N-1 

) 2mipk ) —2mipk 

Fy of vie fre Tip: to tk = wiz Fre TULPK 
k=0 p=0 


20.6.3 The N-membered point sets f; and F’, are discrete Fourier transforms of each other. 
Derive the following symmetry relations: 
(a) If f; is real, F, is Hermitian symmetric; that is, Fp = Fyy_ ee 
(b) If f; is pure imaginary, then F,, = — Fx _ a 
Note. The symmetry of part (a) is an illustration of aliasing. If F, describes an ampli- 
tude at a frequency proportional to p, we necessarily predict an equal amplitude at the 
frequency proportional to N — p. 





5See, for example, G. D. Bergland, A guided tour of the fast Fourier transform, JEEE Spectrum 6: 41 (1969). A good discussion 
can also be found in W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes, 2nd ed., Cambridge: 
Cambridge University Press (1996), section 12.3. 
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20.7 LAPLACE TRANSFORMS 


Definition 


The Laplace transform f(s) of a function F(t) is defined by® 
[o.@) 
f@)=L{FM}= / e ' F(t)dt. (20.126) 
0 


A few comments on the existence of the integral are in order. The infinite integral of F(t), 
oe) 
/ F(t)dt, 
0 


need not exist. For instance, F(t) may diverge exponentially for large t. However, if there 
are some constants s9, M, and fo > O such that for all t > to 


le" F(t)| < M, (20.127) 


the Laplace transform will exist for s > 59; F(t) is then said to be of exponential order. 
As a counterexample, F(t) =e" does not satisfy the condition given by Eq. (20.127) and 
is not of exponential order. Thus, L } e’ *| does not exist. 


The Laplace transform may also fail to exist because of a sufficiently strong singularity 
in the function F(t) as t > 0. For example, 


CO 


/ et" dt 


0 


diverges at the origin for n < —1. The Laplace transform £ {t”} does not exist forn < —1. 
Since, for two functions F(t) and G(t) for which the integrals exist, 


clara +c} =aL{F(t)} +bL{GW}, (20.128) 


the operation denoted by CL is linear. 


Elementary Functions 


To introduce the Laplace transform, let us apply the operation to some of the elementary 
functions. In all cases we assume that F(t) = 0 for t < 0. If 


F(t)=1, t>0, 





©This is sometimes called a one-sided Laplace transform; the integral from —oo to +00 is referred to as a two-sided Laplace 
transform. Some authors introduce an additional factor of s. This extra s appears to have little advantage and continually gets 
in the way; for further comments, see section 14.13 in the text by Jeffreys and Jeffreys (Additional Readings). Generally, we 
take s to be real and positive. It is possible to let s become complex, provided Ne(s) > 0. 
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then 
[oe 
ciy= fewar= 2 for s>0O. (20.129) 
: Ss 
Next, let 


F(j=e", t+>0. 


The Laplace transform becomes 











[o,@) 
1 
ie {ew} = / eM di=——, for s>k. (20.130) 
y— 
0 
Using this relation, we obtain the Laplace transform of certain other functions. Since 
coshkt = 5(eX' +e7*), — sinhkt = 4 (eX — ee“), (20.131) 
we have 
1 1 1 S 
hkt} = = : 20.132 
a 5(—z+=7) s* — k? ( ) 
£ {sinh kr} : : : : (20.133) 
sin = = F : 
2\s—-k s+k s2 — k2 


both valid for s > k. 
From the relations 


coskt=coshikt, sinkt = —isinhikt, 
it is evident that we can obtain transforms of the sine and cosine if k is replaced by ik in 
Egs. (20.132) and (20.133): 


L {coskt} = (20.134) 


s 
sz +k?’ 


L{sinkt} = (20.135) 


s2+k2’ 
both valid for s > 0. Another derivation of this last transform is given in Example 20.8.1. 
It is a curious fact that lims_,9 £ {sinkt} = 1/k despite the fact that i sinkt dt does not 
exist. 

Finally, for F(t) = t”, we have 


CO 
Ete So dt, 
0 


which is just a gamma function. Hence 


r+ 
c{mj=-S". s>0,n>-1. (20.136) 
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Note that in all these transforms we have the variable s in the denominator, so that it 
occurs as a negative power. From the definition of the transform, Eq. (20.126) and the 
existence condition, Eq. (20.127), it is clear that if f(s) is a Laplace transform, then 
lims-+o0 f(s) = 0. The significance of this point is that if f(s) behaves asymptotically 
for large s as a positive power of s, then no inverse transform can exist. 


Heaviside Step Function 


In Exercise 1.11.9 we encountered the Heaviside step function u(t). Because of its utility 
in describing discontinuous signal pulses, its Laplace transform occurs frequently. We 
therefore remind the reader of the definition 


0, t<k, 
u(t —k)= (20.137) 
1, t>k. 
Taking the transform, we have 
a 1 
L{u(t —k)} ie dee, (20.138) 
S 


k 


Example 20.7.1 ‘TRANSFORM OF SQUARE PULSE 


Let’s compute the transform of a square pulse F(t) of height A that is on from t = 0 to 
t = fo; see Fig. 20.13. Using the Heaviside step function, the pulse can be represented as 


F®= Al u(t) Hee to) 
Its transform is therefore 


L{F(t)} = Z (l—e7), 


Ss 


Dirac Delta Function 


For use with differential equations one further transform is helpful, namely that of the 
Dirac delta function. From the properties of the delta function, we have 


[o.@) 
L{8(t —t)} = / e"'8(t —t)dt =e", for to>0. (20.139) 
0 

















FiGuRE 20.13 Square pulse. 
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For to = 0 we must be a bit more careful, as the sequences we have used for defining the 
delta function involve contributions symmetrically distributed about fo, and the integration 
defining the Laplace transform is restricted to t > 0. Consistent results when using Laplace 
transforms, however, are obtained if we consider delta sequences that are entirely within 
the range t > fo, which is equivalent to 


L{8(t)} =1. (20.140) 


This delta function is frequently called the impulse function because it is so useful in 
describing impulsive forces, that is, forces lasting only a short time. 


Inverse Transform 


As we have already seen in our discussion of the Fourier transform, the taking of an integral 
transform will ordinarily have little value unless we can carry out the inverse transform. 
That is, with 


L{F(t)} = f(s), 
then it is desirable to be able to compute 
L'{f(s)}= Fo). (20.141) 


However, this inverse transform is not entirely unique. Two functions Fj (t) and F2(t) can 
have the same transform, f(s), if their difference, N(t) = F\(t) — F2(f), is a null function, 
meaning that for all to > 0 it satisfies 


10 
/ N(t)dt =0. 
0 


This result is known as Lerch’s theorem, and is not quite equivalent to F| = F2, because 

it permits F; and F> to differ at isolated points. However, in most problems studied by 

physicists or engineers this ambiguity is not important and we will not consider it further. 
The inverse transform can be determined in various ways. 


1. A table of transforms can be built up and used to identify inverse transformations, 
exactly as a table of logarithms can be used to look up antilogarithms. The preceding 
transforms constitute the beginnings of such a table. More complete sets of Laplace 
transforms are in several of the Additional Readings, and a relatively short table of 
transforms appears in the present text as Table 20.1. Many functional forms not in 
Table 20.1 can be reduced to tabular entries using a partial fraction expansion or other 
properties of the Laplace transform presented later in this chapter. Of particular value 
in this regard are the translation and derivative formulas. There is some justification for 
suspecting that these tables are probably of more value in solving textbook exercises 
than in solving real-world problems. 

2. A general technique for £~! will be developed in Section 20.10 by using the calculus 
of residues. 

3. Transforms and their inverses can be represented numerically. See the work by Krylov 
and Skoblya in Additional Readings. 
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Table 20.1 Laplace Transforms“ 





























F(s) F(t) Limitation Equation 
1, . 4 d(t) Singularity at +0 (20.140) 
1 
2.0 = 1 s>0 (20.129) 
Ss 
r 1 
3. ca ) ” s>0,n>-l (20.136) 
Ss 
1 kt 
4, e s>k (20.130) 
s—k 
! kt 
5. o-02 te s>k (20.176) 
s- 
s 
6. sg cosh kt s>k (20.132) 
7 : inhk, k 20.133 
" 2p sinh kt s> (20.133) 
s 
8. Dae coskt s>0O (20.134) 
: ink 20.135 
9. soa ke sinkt s>0O (20. ) 
10 — at cos kt (20.159) 
oT e“ cos s>a ile 
(s —a)? +k 
k 
11. Grate e" sinkt s>a (20.158) 
s—a)*+ 
s2— 
12. ——,, tcoskt s>0O (20.177) 
(s2 Le k2)2 
13 = tsinkt 0 (20.178) 
._ = sin s> ; 
(s2 ops k2)2 
14. (s?+a2)7!/2 Jo(at) s>0 (20.182) 
15. (s2 = a2)1/2 Ip(at) s>a Exercise 20.8.13 
1 -1/% ‘ ; 
16. —cot (*) Jo(at) s>0 Exercise 20.8.14 
a a 
1 1 sta 
—In 
2a s—a : ; 
17. ig (at) s>a Exercise 20.8.14 
—coth—! (=) 
a 
(s —a)" 
18. a1 Ly (at) s>0 Exercise 20.8.16 
Ss 
1 : 
19. . In(s + 1) E\(x) s>0 Exercise 20.8.17 
Ins : 
20. — —Int-y s>0 Exercise 20.10.9 





@ y is the Euler-Mascheroni constant. 
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Example 20. 7.2 PARTIAL FRACTION EXPANSION 


The function f(s) = k*/s(s* +k) does not appear as a transform listed in Table 20.1, but 
we may obtain it from the tabulated transforms by observing that it has the partial fraction 
expansion 

Reo s 
s(s2 +k?) 5 st? 42° 
The partial fraction technique was discussed in Section 1.5, and the present example was 
the subject of Example 1.5.3. 


Each of the two partial fractions corresponds to an entry in Table 20.1, and we can 
therefore take the inverse transform of f(s) term by term: 


Lo! {f(s)}=1-coskt. 





f(s)= 


Remember that the range of the inverse transform is restricted to t > 0. |_| 


Example 20.7.3 = ASTEP FUNCTION 


This example shows how Laplace transforms can be used to evaluate a definite integral. 








Consider 
[o.@) 
sin tx 
F(t)h= / dx. (20.142) 
x 
0 
Suppose we take the Laplace transform of this definite (and improper) integral, naming 
it f(s): 
Cc CO CO 
sin tx _s [ Sintx 
f@M=L ——dxt=]|e° dx dt. 
x x 
0 0 0 
Now, interchanging the order of integration (which is justified),’ we get 
[o,@) [o,@) CO 
f(s) /- / —t sintx dt | d [= (20.143) 
sy=]- e ' sintx x= : . 
x s2 +42 
0 0 0 


since the factor in square brackets is just the Laplace transform of sintx. The integral on 
the right-hand side is elementary, with evaluation 








CO 
«=| Oe wee (20.144) 
F)= P42 5° Ss ec 2 : 
0 
Using entry #2 in Table 20.1, we carry out the inverse transformation to obtain 
F(t)= - ai (20.145) 


7See Chapter | in Jeffreys and Jeffreys (Additional Readings) for a discussion of uniform convergence of integrals. 
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F(t) 
A 
1 
2 
t 
1 
2 





FIGURE 20.14. F(t) = fo° aint gy, a step function. 


in agreement with an evaluation by the calculus of residues, Eq. (11.107). It has been 
assumed that t > 0 in F(t). For F(—t) we need note only that sin(—tx) = — sintx, giving 
F(—t) = —F(t). Finally, if tf = 0, F (0) is clearly zero. Therefore, 





IT 
00 2 t>0 
t 
[> Sdx = 5 [2u(t) ij=40, #26 (20.146) 
0 . id 
—-~, t<0. 
2 
CO 


Here u(t) is the Heaviside unit step function, Eq. (20.137). Thus, - (sintx/x)dx, taken as 
0 
a function of t, describes a step function (Fig. 20.14), with a step of height m att=0. 


The technique in the preceding example was to (1) introduce a second integration, 
namely the Laplace transform, (2) reverse the order of integration and integrate once, 
and (3) take the inverse Laplace transform. This is a technique that will apply to many 
problems. 


Exercises 
20.7.1 Prove that 
lim sf(s)= lim F(t). 
S—>0o t>+0 
Hint. Assume that F(t) can be expressed as F(t) = S09 dnt”. 
20.7.2 Show that 
1. 
— lim £ {cos xt} = 6(x). 
WT s—->0 
20.7.3 Verify that 





cos at — cos bt RY 
re = , @ eb’. 
[Sao | ase oF 





20.7.4 


20.7.5 


20.7.6 


20.7.7 


20.7.8 
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Using partial fraction expansions, show that 




















1 eoat — et 
Lor = b. 
@) {| — ae 
s ae" — be? 
b) £7! = b. 
(b) lcrackn| a= a 
Using partial fraction expansions, show that for a* 4 b’, 
1 1 sinat  sinbt 
gt = 
(@) [=reoncrEn| a =| a b 
- 1 
ey..c 2 = {asinat — bsinbt}. 
(s? +.a?)(s? +b?) J a? - PY? 
Show that 
Co 
cos s TU 
ds = , 0 1. 
(a) : s»  ~ 2y—Nleos(vn/2)’) 
0 
Ce 
(b i SINS: _ a 0 D 
) »  — 2v_—Disinwn/2) 
0 


Why is v restricted to (0, 1) for (a), to (0, 2) for (b)? These integrals may be interpreted 
as Fourier transforms of s~” and as Mellin transforms of sins and coss. 


Hint. Replace s~” by a Laplace transform integral: £ {et | /T(). Then integrate with 
respect to s. The resulting integral can be treated as a beta function (Section 13.3). 


A function F(t) can be expanded in a Maclaurin series, 


[ee 


Biya 3. ant” 


n=0 
Then 


oO Cc 
Se / et" dt. 


n=0 0 


L{F(t)} Seta dt = 
0 n=0 


Show that f(s), the Laplace transform of F(t), contains no powers of s greater than 
s~!. Check your result by calculating £ {5(t)} , and comment on this fiasco. 


Show that the Laplace transform of the confluent hypergeometric function M (a, c; x) is 


1 1 
L{M(a,c;x)}=—- 2F) (« Ise -) 
S S 
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20.8 PROPERTIES OF LAPLACE TRANSFORMS 


Transforms of Derivatives 


Perhaps the main application of Laplace transforms is in converting differential equations 
into simpler forms that may be solved more easily. It will be seen, for instance, that coupled 
differential equations with constant coefficients transform to simultaneous linear algebraic 
equations. For the study of differential equations we need formulas for the Laplace trans- 
forms of the derivatives of a function. 

Let us transform the first derivative of F(t): 


clro}= fer Par. 
0 


Integrating by parts, we obtain 


LIP Ol=e "FO +f e"'F@dr 
0 
0 


=sL{F(t)} — F(0). (20.147) 





Strictly speaking, F(0) = F(+0),° and dF/dt is required to be at least piecewise 
continuous for 0 < t < oo. Naturally, both F(t) and its derivative must be such that the 
integrals do not diverge. An extension to higher derivatives gives 


ye [Fo | = s°L{F(t)} — sF(+0) — F’(+0), (20.148) 


LEV OSC ORO Pas FG) eS FO OG, (20.149) 


The Laplace transform, like the Fourier transform, replaces differentiation with multi- 
plication. In the following examples ODEs become algebraic equations. Here is the power 
and the utility of the Laplace transform. But see Example 20.8.7 for what may happen if 
the coefficients are not constant. 

Note how the initial conditions, F(-++0), F’(+0), and so on, are incorporated into the 
transform. This situation is different than for the Fourier transform, and arises from 
the finite lower limit (t = 0) of the integral defining the transform. This property makes 
the Laplace transform more powerful for obtaining solutions to differential equations sub- 
ject to initial conditions. 


8This notation means that zero is approached from the positive side. 
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Example 20.8.1 Use OF DERIVATIVE FORMULA 


Here is an example showing how the derivative formula has uses even in contexts not 
involving the solution to a differential equation. Starting from the identity 
a 
—k* sinkt = —> sinkt, (20.150) 
dt? 


we apply on both sides of the equation the Laplace transform operation, reaching 


prs a, 
—k°£{sinkt}=L oe 


d 
= s7£ {sinkt} — s sin(0) — — sinkt 
dt t=0 
Since sin(0) = 0 and d/dt sinkt |;=0=k, the above equation has solution 
k 
L£{sinkt} = =—,. 
{sin kt} 4k 
This result confirms Eq. (20.135). | 


Examples involving the solutions of differential equations follow. 


Example 20.8.2 — SimpLe HARMONIC OSCILLATOR 


As a physical example, consider a mass m oscillating under the influence of an ideal spring, 
spring constant k. As usual, friction is neglected. Then Newton’s second law becomes 
d’X(t) 
dt? 





+ kX (t) =0. (20.151) 
We take as initial conditions 
X(0)=Xo, X’(0)=0. 
Applying the Laplace transform, we obtain 
d°’x 
me{ | + kL{X(t)} = 0. (20.152) 


Letting x(s) denote the presently unknown transform L£ {X (t)} and using Eq. (20.148), we 
convert Eq. (20.152) to the form 


ms°x (s) —msX9 + kx(s) =0, 
which has solution 
x(s) = Xo a , with w = = 
From Table 20.1 this is seen to be the transform of cos wot, which gives the expected result: 
X(t) = Xpcoswot. (20.153) 
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Example 20.8.3 — EartH’s NUTATION 


A somewhat more involved example is the nutation of the Earth’s poles (force-free pre- 
cession). We treat the Earth as a rigid (oblate) spheroid, with z-axis through its direction 
of symmetry. We assume the spheroid to have moments of inertia J, and [, = J, and to 
be rotating about its x, y, and z axes at the respective angular velocities X(t) = w, (ft), 
Y (t) = wy(t), w, = constant. The Euler equations of motion for X and Y reduce to 

a Y, = x 20.154 

a ay, a +ax, (20.154) 
where a = [Uz — I,)/1z]w-. For the Earth, the initial values of X and Y are not both 
zero, so the axis of rotation is not aligned with the symmetry axis (see Fig. 20.15), and 
because of this lack of alignment, the axis of rotation precesses about the axis of symmetry. 
For the Earth, the deviation between the rotation and symmetry axes is small, only about 
15 meters (measured at the Earth’s surface at the poles). 

Our first step in solving these coupled ODEs is to take their Laplace transforms, 

obtaining 





sx(s) — X(0) =—ay(s), sy(s) —Y(O) =ax(s). 
Combining to eliminate y(s), we have 
s’x(s) —sX(0) +aY (0) = —a’x(s), 


or 





s a 
ASS) rae OO) (20.155) 
Recognizing these functions of s as transforms listed in Table 20.1, 


X (t) = X (0)cos at — Y(0)sinat. 








FiGure 20.15 Earth’s rotation axis and its components. 
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Similarly, 
Y(t) = X(0)sinat + Y(O)cosat. 


This is seen to be a rotation of the vector (X, Y) counterclockwise (for a > 0) about the 
z-axis with angle 6 = at and angular velocity a. 

A direct interpretation may be found by choosing the x and y axes so that Y(0) = 0. 
Then 


X(t)=X(O)cosat, Y(t)= X(0)sinat, 


which are the parametric equations for rotation of (X, Y) ina circular orbit of radius X (0), 
with angular velocity a in the counterclockwise sense. 

For the Earth, a as defined here corresponds to a period (27/a) of some 300 days. 
Actually, because of departures from the idealized rigid body assumed in setting up Euler’s 
equations, the period is about 427 days.” 

These same equations arise in electromagnetic theory. If in Eq. (20.154) we set 


XOH=ie YO=HI,, 


where L, and Ly are the x- and y-components of the angular momentum L of a charged 


particle moving in a uniform magnetic field B,e,, and then assign a the value a = —g,, B;, 
where g, is the gyromagnetic ratio of the particle, then Eq. (20.148) determines its 
Larmor precession in the magnetic field. a 


Example 20.8.4 Impulsive Force 


For an impulsive force acting on a particle of mass m, Newton’s second law takes the form 
ax PS(t) 
m— = ; 
dt? 
where P is a constant. Transforming, we obtain 
ms°x(s) —msX (0) — mX’(0) = P. 


For a particle starting from rest, X’(0) = 0. We shall also take X (0) = 0. Then 





x(s) = —, 
(s) 7) 
and, taking the inverse transform, 
P 
X(t) =—t, 
m 
dx(t) P 
=-—, aconstant. 
dt m 


The effect of the impulse P 5(f) is to transfer (instantaneously) P units of linear momen- 
tum to the particle. 





°D. Menzel, ed., Fundamental Formulas of Physics, Englewood Cliffs, NJ: Prentice-Hall (1955), reprinted, Dover (1960), 
p. 695. 
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A similar analysis applies to the ballistic galvanometer. The torque on the galvanometer 
is given initially by ki, in which ¢ is a pulse of current and k is a proportionality constant. 
Since 1 is of short duration, we set 


ki=kq 8(t), 


where q is the total charge carried by the current 1. Then, with J the moment of inertia, 


I a =kq 6(t) 

d 2 =kq ’ 
and transforming, as before, we find that the effect of the current pulse is a transfer of kg 
units of angular momentum to the galvanometer. a 


Change of Scale 


If we replace tf by at in the defining formula for the Laplace transform, we readily obtain 


L{F(at)} ae F(at)dt = = fe! Faryatan 
‘ a 
1 Ss 
=-f Cc) (20.156) 


Substitution 


If we replace the parameter s by s — a in the definition of the Laplace transform, 
Eq. (20.126), we have 


[o,2) [o,e) 


f(s-a= ‘| e SO F(t)dt = / ee" F(t)dt 
0 0 
=L{e"F(}. (20.157) 


Hence the replacement of s with s — a corresponds to multiplying F(t) by e*, and 
conversely. This result can used to check some entries in our table of transforms. From 
Eq. (20.157) we find immediately that 


k 


at: = 
L {e sinkt} = iva ie ; (s > a), (20.158) 
and 
at _ s—a 
Lie coskt} = "ae S>d. (20.159) 


These are entries 10 and 11 of Table 20.1. 
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Example 20.8.5 — DampeD OsciLLATOR 


Equations (20.158) and (20.159) are useful when we consider an oscillating mass with 
damping proportional to the velocity. Equation (20.151), with such damping added, 
becomes 


mX"(t) +bX’'(t) + kX (t) =0, (20.160) 


in which b is a proportionality constant. Let us assume that the particle starts from rest at 
X (0) = Xo, so X’(0) = 0. The transformed equation is 


m[s*x(s) — sXo] + b[sx(s) — Xo] + kx(s) =0, 


with solution 


ms +b 


= Xp —~—_—_—_—-. 
a) Oms? +bs +k 


This transform does not appear in our table, but may be handled by completing the square 


of the denominator: 
2,5, k 2 on ko bP 
Ss —s+—=\|s : 
m m 2m m 4m? 


Considering further only the case that the damping is small enough that b? < 4km, then 
the last term is positive and will be denoted by or. We then rearrange x(s) to the form 





s+b/m 
2 
(s +.b/2m)2 + wy 
’ s+b/2m Xo a1 (b/2ma 1) 
(s + b/2m)? + wt (s + b/2m)? + a7 





x(s) = Xo 





These are the same transforms we encountered in Eqs. (20.158) and (20.159), so we may 
take the inverse transform of our formula for x(s), reaching 





b 
XD) =Xpe Cm (cos ait+ sin out) 
2ma, 


= Xo? 6 O/2™! cos(wit — 9g). (20.161) 
| 


Here we have made the substitutions 





. &k 
tang = o=—.- 
m 


2ma,’ 


Of course, as b + 0, this solution goes over to the undamped solution, given in Example 
20.8.2. a 
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FIGURE 20.16 RLC circuit. 


RLC Analog 


It is worth noting the similarity between the damped simple harmonic oscillation of a 
mass (Example 20.8.5) and an RLC circuit (resistance, inductance, and capacitance). See 
Fig. 20.16. At any instant, the sum of the potential differences around the loop must be 
zero (Kirchhoff’s law, conservation of energy). This gives 


reared frame (20.162) 


Differentiating Eq. (20.162) with respect to time (to eliminate the integral), we have 


tot +R aad oe 0. (20.163) 
dt2 qf °C! ‘ 


If we replace /(t) with X(t), L with m, R with b, and C~! with k, then Eq. (20.163) is 
identical with the mechanical problem. It is but one example of the unification of diverse 
branches of physics by mathematics. A more complete discussion will be found in a book 
by Olson.!° 


Translation 


This time let f(s) be multiplied by e~”°, with b > 0: 
[o,@) 
eS f(s) =e i] e*' F(t)dt 


0 


CO 
= | e C+) F(t) dt. (20.164) 
0 


10H. F. Olson, Dynamical Analogies, New York: Van Nostrand (1943). 





20.8 Properties of Laplace Transforms 1023 


Now let t + b= t. Equation (20.164) becomes 
[o,@) 
efi = ( ere =b)dr. (20.165) 
b 


Since F(t) is assumed to be equal to zero for t < 0, so that F(t — b) = 0 for0<t <b, 
we can change the lower limit in Eq. (20.165) to zero without changing the value of the 
integral. Then renaming t as our standard Laplace transform variable t, we have 


ce FS) = LIF —D)}. (20.166) 


If instead of relying on the assumption that F(t) = 0 for negative t we insert a Heaviside 
unit step function u(t — b) to restrict the contributions from F to positive arguments, 
Eq. (20.165) takes the form 


FQ) = / et F(t — byu(t — bdr. 


0 


For this reason the translation formula, Eq. (20.166), is often called the Heaviside shifting 
theorem. 


Example 20. 8. 6 ELECTROMAGNETIC WAVES 


The electromagnetic wave equation with E = Ey or E,, a transverse wave propagating 
along the x-axis, is 

a E(t) 1 OP EGY) _ 

ax? vars 





0. (20.167) 


We want to solve this PDE for the situation that a source at x = 0 generates a time- 
dependent signal E(0,f) starting at time t = 0 and propagating only toward positive x, 
with initial conditions that for x > 0, 

dE(x,t) 


E(x,0)=0, = 0. 
(x, 0) a lta 





Transforming Eq. (20.167) with respect to f, we get 


2 2 
* LIE.) — 2, L{EG,1)) + 4 EC, 0)+ dE (x,t) 
° v at t=0 





Ox? v2 me 


which due to the initial conditions simplifies to 


a2 s? 
ag LEGO} = 5 LEG, D}. (20.168) 
Ox v 
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The general solution of Eq. (20.168) (which is an ODE in x) is 
L{E(x,t)} = fi(sye 9/* + fo(syet O/*, (20.169) 


To understand more fully this result consider first the case fo(s) = 0. Then Eq. (20.169) 
becomes 


L{EG@, Hy =e OMS f(s), (20.170) 


which we recognize as of the same form as Eq. (20.166), meaning that 
x 
E(x,t)=F (t- ~), 
v 


where F is the function whose Laplace transform is f}, namely E(0,t).'! Since F is 
assumed to vanish when its argument is negative, this formula can be written in the more 


explicit form 
F(t-=)=£(0,1--), fo. 
v v v 
x 


0, t<-. 
v 


E(x,t) = (20.171) 


This solution represents a wave (or pulse) moving in the positive x-direction with velocity 
v. Note that for x > vt the region remains undisturbed; the pulse has not had time to get 
there. If we had decided to take the solution of Eq. (20.169) with f;(s) = 0, we would have 
obtained 
X X X 
F(++=) =E(0,1+=), t>—--, 
v v v 
X 


0, t<-—-, 
v 


E(x,t)= (20.172) 


which we must reject because (for propagation toward positive x) it violates causality. 
Our solution to this problem, Eq. (20.171), can be verified by differentiation and substi- 
tution into the original PDE, Eq. (20.167). | 


Derivative of a Transform 


When F(t), which is at least piecewise continuous, and s are chosen so that e~*' F(t) 
converges exponentially for large s, the integral 


CO 


i e 'F(t)dt 


0 


'1 Consider Eq. (20.170) with x set to zero. 
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is uniformly convergent and may be differentiated (under the integral sign) with respect 
to s. Then 


f= [vero =L{-tF()}. (20.173) 
0 


Continuing this process, we obtain 
FM N=L{(—)" FC}. (20.174) 


All the integrals so obtained will be uniformly convergent because of the decreasing expo- 
nential behavior of e~*' F(t). 
This technique may be applied to generate more transforms. For example, 





[o,@) 

1 

cfd} = fete ar = c sk. (20.175) 

<= 
0 


Differentiating with respect to s (or with respect to k), we obtain 


1 
kt\ _ 
Llte }=G>pF s>k. (20.176) 


If we replace k by ik and separate Eq. (20.176) into its real and imaginary parts, we get 


s? —k?* 
L{t cos kt} = (ae (20.177) 
: 2ks 


These expressions are valid for s > 0. 


Example 20.8.7 _ BESSEL’s EQUATION 


An interesting application of a differentiated Laplace transform appears in the solution of 
Bessel’s equation with n = 0. From Chapter 14 we have 


x*y" (x) + xy'(x) + x? y(x) =0. 


This ODE cannot be solved by the method illustrated in Example 20.8.2 because the deriva- 
tives are multiplied by functions of the independent variable x. However, an alternate 
approach depending on Eq. (20.174) is available. Dividing by x and substituting t = x and 
F(t) = y(«) to agree with the present notation, we see that the Bessel equation becomes 


tF’(t)+ F(t) +tF(t) =0. (20.179) 
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We need a regular solution, and it appears possible for F(0) to be nonzero, so we scale the 
solution by setting F (0) = 1. Then, setting t = 0 in Eq. (20.179), we find that F’(+0) =0 
In addition, we assume that our unknown F(t) has a transform. Transforming Eq. (20.179), 
using Eqs. (20.147) and (20.148) for the derivatives and Eq. (20.173) to append factors of 
t, we have 


-<Is F()-5] +sf(s)-1- 4 fos) =0. (20.180) 


Rearranging and simplifying, we obtain 
> + 1) f(s) +sf(s) =0, 


or 
df sds 


fo +t 
a first-order ODE. By integration, 
1 
In f(s) = = In(s* +1) +1nc, 


which may be rewritten as 


f(s)= (20.181) 


Cc 
Vs? +1 
To confirm that our transform yields the power-series expansion of Jo, we expand f(s) 
as given in Eq. (20.181) ina series of negative powers of s, convergent for s > 1: 


Cc 1\ 
f= S(1+5) 
Ss S 


=f : 3 eee 





Ss 22. 254 (2"n!)252" 


Inverting, term by term, we obtain 


(- 1)"12" 
ro=ce Gre (2"n (2"n!)2 : 


When C is set equal to 1, as required by the initial condition F(0) = 1, we recover Jo(t), 
our familiar Bessel function of order zero. Hence, 


1 
Vsz+1 
This simple, closed form is the Laplace transform of Jo(¢). After making a scale change to 
form Jo(at) using Eq. (20.156), we confirm entry 14 of Table 20.1. 


Note that in our derivation of Eq. (20.182) we assumed s > 1. The proof for s > 0 is the 
topic of Exercise 20.8.10. | 


Li Jo(t)} = 





(20.182) 
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It is worth noting that this application was successful and relatively easy because we took 
n = 0 in Bessel’s equation. This made it possible to divide out a factor of x (or f). If this 
had not been done, the terms of the form ¢? F(t) would have introduced a second derivative 
of f(s). The resulting equation would have been no easier to solve than the original one. 
This observation illustrates the point that when we go beyond linear ODEs with constant 
coefficients, the Laplace transform may still be applied, but there is no guarantee that it 
will be helpful. 

The application to Bessel’s equation, n 4 0, will be found in the Additional Readings. 
Alternatively, given the result 


a "(Vs2 +a2 —s)" 
/s2 4 q2 ? 





L{Jn(at)} = (20.183) 


we can confirm its validity by expressing J,,(t) as an infinite series and transforming term 
by term. 


Integration of Transforms 


Again, with F(t) at least piecewise continuous and x large enough so that e~*! F(t) 
decreases exponentially (as x — oo), the integral 


[o,@) 


fx)= ; e' F(t)dt 


0 


is uniformly convergent with respect to x. This justifies reversing the order of integration 
in the following equation: 


[ feodx= fax fa etra= few =P at, 
Ss S 0 


0 


=c{=8} (20.184) 


t 


where the last member of the first line is obtained by integrating with respect to x. The 
lower limit s must be chosen large enough so that f(s) is within the region of uniform 
convergence. Equation (20.184) is valid when F(t)/t is finite at t = O or diverges less 
strongly than t~! (so that £ {F(t)/t} will exist). 

For convenience we summarize the definition and properties of the Laplace transform 
in Table 20.2. Included in the table are formulas for convolution and inversion that will be 
discussed in Sections 20.9 and Sections 20.10. 
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Exercises 


20.8.1 


20.8.2 


Table 20.2 Laplace Transform Operations 











Operation Equation 
a 

1. Laplace transform f(s) =L{F(O}= / e*' F(t)dt (15.99) 

2. Transform of derivative sf (8) — F(+0) = cL {F'(@)} (15.123) 

s? f(s) — sF(+0) — F'(+0) = L{F"@)} (15.124) 


3. Transform of integral 


4. Change of scale 
5. Substitution 
6. Translation 


7. Derivative of transform 


8. Integral of transform 


9. Convolution 


10. Inverse transform, 
Bromwich integral“ 


t 
Lro=e| frows] 


0 
1 S 
- f (=) =LIF(ar)} 
fis-a=L{eVF@} 


eS f(s) =L{F(t —b)} 


FM) =L{CH" FO} 


i f(x)dx =L | _ 


t 
fils) fa(s) =L {/ Fi(t — orcon| 


0 
B+ioo 
= / e* f(s)ds = F(t) 
JTL 
B—ioo 


Exercise 20.9.1 


(20.156) 
(15.152) 
(15.164) 


(15.173) 


(15.189) 


(15.193) 


(15.212) 





@ 8 must be large enough that e~! F(t) vanishes as t > +00. 


Use the expression for the transform of a second derivative to obtain the transform of 


coskt. 


A mass m is attached to one end of an unstretched spring, spring constant k (Fig. 20.17). 
Starting at time ¢ = 0, the free end of the spring experiences a constant acceleration a, 
away from the mass. Using Laplace transforms, 


(a) find the position x of m as a function of time. 


(b) determine the limiting form of x(t) for small ¢. 


1, a . =k 
ANS. (a) x=-x=at“—-—(1-cosot), wo =—, 
2 ao m 


2. 


(b) x= a ot <1. 





20.8.3 
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pons | 





FiGurRE 20.17 Spring, Exercise 20.8.2. 


Radioactive nuclei decay according to the law 


dN _ iN 
dt , 

with N the concentration of a given nuclide and A its particular decay constant. This 
equation may be interpreted as stating that the rate of decay is proportional to the num- 
ber of these radioactive nuclei present. They all decay independently. 


Consider now a radioactive series of n different nuclides, with Nuclide 1 decaying into 
Nuclide 2, Nuclide 2 into Nuclide 3, etc., until reaching Nuclide n, which is stable. The 
concentrations of the various nuclides satisfy the system of ODEs 


dN 


= one 2 pe asa al 
jee ee ae * de 





= An—1Nn-1- 


(a) For the case n = 3 find Nj (t), No(t), and N3(t), with Nj (0) = No and N2(0) = 
N3(0) = 0. 


(b) Find an approximate expression for Nz and N3, valid for small t when A; ¥ Apo. 


(c) Find approximate expressions for Nz and N3, valid for large t, when (1) 4; >> A2, 
(2) 1 KA. 





A —Ait dot 
= (e } —e 2")s 


ANS. (a) Ni(t)=Noe~*"" Na(t) = No= 
2 


uo) aah At =} 
N3(t) = No{ 1— pg TAF) 
oy o( es ea 


Ay 





N 
(b) No NodAit, N3 dat’. 


(c) (1) No © Noe”! 
N3 ® No(1 — e772"), Ayt >> 1. 
(2) Nz © No(A1/Azye*", 
N3 No(1—e7*""),  Agt >> 1. 
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20.8.4 


20.8.5 


20.8.6 


The rate of formation of an isotope in a nuclear reactor is given by 


dN 

<= = | 211 (0) — 2Na(0) | — 2aNa(0). 

Here Nj (0) is the concentration of the original isotope (assumed constant), and N> is 
that of the newly formed isotope. The first two terms on the right-hand side describe the 
production and destruction of the new isotope via neutron absorption; g is the neutron 
flux (units cm~?s~!); 01 and o> (units cm”) are neutron absorption cross sections. The 
final term describes the radioactive decay of the new isotope, with decay constant A>. 


(a) Find the concentration N2 of the new isotope as a function of time. 


(b) For original isotope '**Eu, 0; = 400 barns = 400 x 1074 cm”, 62 = 1000 barns 
= 1000 x 10-74 cm?, and Az = 1.4 x 10-? s7!. If N,(O) = 107° and g = 
10? cm~2s—!, find No, he concentration of !**Eu, after 1 year of continuous irra- 
diation. Is the assumption that N; is constant justified? 


In a nuclear reactor '*°Xe is formed as both a direct fission product of 7*°U and by 
decay of '>I (another fission product), half-life 6.7 hours. The half-life of !*>Xe is 
9.2 hours. Because !*>Xe strongly absorbs thermal neutrons, thereby “poisoning” the 
nuclear reactor, its concentration is a matter of great interest. The relevant equations are 
Ni 

ae pyilor¢ Nu) — ATM, 

dNxe 
dt 


Here Ny, Nxe, Ny are the concentrations of !351, !5Xe, 235U, with Ny assumed to be 
constant. The neutron flux g in the reactor causes fission of 77°U with cross section 
and removes !3° es by neutron absorption with cross section oxe = 3.5 x 10° barns = 
3.5 x 107!8 cm?. Neutron absorption by !*I is negligible. The yield of !*°I and !*°Xe 
per fission are, respectively, yj = 0.060 and yxe = 0.003. 





= ol Vxe(of Nu) — oxeNxe| + ATNt — AxeNXxe- 


(a) Find Nx-(t) in terms of neutron flux g and the product of Nu. 
(b) Find Nxe(t > oo). 


(c) After Nxe has reached equilibrium, the reactor is shut down: g = 0. Find Nxe(t) 
following shutdown. Note the short-term increase in Nx-~, which may for a few 
hours interfere with starting the reactor up again. 


Hint. The half-life 1/2 of a radioactive isotope is the time required for decay of half of 
the nuclides in a sample. For a decay rate dN /dt = —AN, the half-life has the value 
ty2 =1n2/A, so 4 can be computed as 4 = In2/t 2 = 0.693/t1/2. 


Solve Eq. (20.160), which describes a damped simple harmonic oscillator, for X (0) = 
Xo, X'(0) =0, and 


(a) b* =4mk (critically damped), 
(b) b* > 4mk (overdamped). 


b 
ANS. (a) X(t) = Xoe @/2™! (: 4. wnt). 
m 





20.8.7 


20.8.8 


20.8.9 


20.8.10 
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Again solve Eq. (20.160), which describes a damped simple harmonic oscillator, but 
this time for X (0) = 0, X’(0) = vo, and 

(a) b* <4mk (underdamped), 

(b) b* =4mk (critically damped), 

(c) b? >4mk (overdamped). 


ANS. (a) X(t) = 2 e~ /2™" sin at, 
@) 
(b) X(t) = vote 0/2", 
The motion of a body falling in a resisting medium may be described by 


d’X(t) dX (t) 

m =mg—b 
dt? dt 

when the retarding force is proportional to the velocity. Find X(t) and dX (t)/dt for the 

initial conditions 





x@=&% =0 
7 dt i , 


Ringing circuit. In certain electronic devices, resistance, inductance, and capacitance 
are placed in a circuit as shown in Fig. 20.18. A constant voltage is maintained across 
the capacitance, keeping it charged. At time t = 0 the circuit is disconnected from the 
voltage source. Find the voltages across each of the elements R, L, and C as a function 
of time. Assume R to be small. 


Hint. By Kirchhoff’s laws 
Irn_tIc=0 and Er+E,= Ec, 


where 





dl 1 
fa=teR Fe=i =, fas? 4% [tear 
0 


qo = initial charge of capacitor. 


With Jo(t) expressed as a contour integral, apply the Laplace transform operation, 
reverse the order of integration, and thus show that 


Li{Jo(t)} = (ai. forse <0. 














FiGuRE 20.18 Ringing circuit. 
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20.8.11 Develop the Laplace transform of J,,(t) from £{Jo(t)} by using the Bessel function 
recurrence relations. 
Hint. Here is a chance to use mathematical induction (Section 1.4). 


20.8.12 A calculation of the magnetic field of a circular current loop in circular cylindrical 
coordinates leads to the integral 


Ce 
i ek I\(ka)dk, Nez>0. 
0 
Show that this integral is equal to a/(z* + a7)?/”. 
20.8.13 Show that 


L{Ip(at)} = (s*-—a?)'/*7, s >a. 


20.8.14 Verify the following Laplace transforms: 


(a) Lt jolat)) =2{="1 =* cor! (£), 





at 
(b) L{yo(at)} does not exist. 
(c) Lica =£| } sh Map SA oD ore (-). 
at a a 
(d) L{ko(at)} does not exist. 





20.8.15 Develop a Laplace transform solution of Laguerre’s equation, 
tF"()+0—-nF'(t)+nF() =0. 


Note that you need a derivative of a transform and a transform of derivatives. Go as far 
as you can with a general value of n; then (and only then) set n = 0. 


20.8.16 Show that the Laplace transform of the Laguerre polynomial L, (at) is given by 


(s — a)" 


gntl ? 





L{L,(at)} = s>0. 
20.8.17 Show that 
1 
L{E\(t)}=—-In(s+1), s>0, 
S 


where 


CO 


CO 
et ext 
Ey) = f = ac= f 3 dx. 
1 


t 








E,(t) is the exponential integral function, first encountered in this book in Table 1.2. 





20.8.18 


20.8.19 


20.8.20 
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(a) From Eq. (20.184) show that 


CO [o,@) 
F(t) 
Sf (x)dx => p dt, 
0 0 
provided the integrals exist. 
(b) From the preceding result show that 
[o,@) 
sint a 
/ —dt=-, 
t 2 
0 


in agreement with Eqs. (20.146) and (11.107). 
(a) Show that 





sinkt 1/8 
L ; | = cot (5). 
(b) Using this result (with k = 1), prove that 
: 1 
L£{si(t)}=—-—tan s, 
Ss 


where 
00 . 
. sinx ee 
si(t) = — | ——dx, the sine integral. 
x 
t 


If F(f) is periodic (Fig. 20.19) with a period a so that F(t + a) = F(t) for all t > 0, 
show that 


L{F(H}= a frre 
0 


Note that the integration is now over only the first period of F(t). 











FIGURE 20.19 Periodic function. 
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20.8.21 


20.8.22 


20.8.23 


20.8.24 


Find the Laplace transform of the square wave (period a) defined by 


1, O0<t<a/2, 
F(t)= 
0, a/2<t<a. 
11-e #2 
ANS. = —- ——_.. 
fo) == 
Show that 
re 
(a) L£{coshatcosat} = Gade’ 
2 3 
2. 
(b) L{coshat sinat} = ene 
2 3 
—2 
(c) L£{sinhatcosat} = eae 
5 : 2as 
(d) L£{sinhatsinat} = Ata 


Show that 
1 t 
@)> £74674 ay7| ae sinat — a2 cosat, 
a a 
Z 
(b) Lo! 3 s(s? +a*)*| =o, sinat, 
a 


(c) Lo! 


a 
N 
ase 
a 
N 
+ 
Q 
N 
~— 
we 
— 
ll 
Y | 
n 
4. 
i=) 
Q 
= 
+ 
I 
oO 
° 
a 
gS 
= 


‘a 
(d) jae hs Ces ay = cosat — > sinat. 


Show that 
Lit? —k?) u(t — bE} = Kolks). 


Hint. Try transforming an integral representation of Ko(ks) into the Laplace transform 
integral. 


20.9 LAPLACE CONVOLUTION THEOREM 


One of the most important properties of the Laplace transform is that given by the convo- 
lution, or Faltung, theorem. We take two transforms, 


AW=L{Fi@} and fp(y=L{Fr(}, 


and multiply them together: 


[o,e) [o,2) 


AAG) = / eo Fy (x)dx / e? Fy(y)dy. (20.185) 


0 0 
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If we introduce the new variable t = x + y and integrate over ¢ and y instead of x and 
y, the limits of integration become (0 < t < 00), (0< y < fr). Noting that the Jacobian of 
the transformation from (x, y) to (t, y) is unity, we have 


[o-@) t 
fils) flsy= fear f Fue Faydy 
0 0 


t 
=£} [ Fa -»FOrdy 
0 
= L{F | * Fy}, (20.186) 
where, similarly to the Fourier transform, we use the notation 
t 
[ Fu-ar@de= rx hr, (20.187) 
0 


and call this operation the convolution of F; and F. It can be shown that convolution is 
symmetric: 


F, * Fy = Fo * Fi. (20.188) 


Carrying out the inverse transform, we also find 
t 
CLA) Al) = i; Fit — 2) Fo@)dz= Fy * Fh. (20.189) 
0 


Convolution formulas are useful for finding new transforms or, in some cases, as an alterna- 
tive to a partial fraction expansion. They also find use in the solution of integral equations, 
as is illustrated in Chapter 21. 


Example 20.9.1 DRIVEN OSCILLATOR WITH DAMPING 


As one illustration of the use of the convolution theorem, let us return to the mass m ona 
spring, with damping and a driving force F(t). The equation of motion, Eq. (20.160), now 
becomes 


mX"(t) + bX'(t) + kX (t) = F(t). (20.190) 


Initial conditions X (0) = 0, X’(0) = 0 are used to simplify this illustration, and the trans- 
formed equation is 


ms?x(s) + bs x(s) +kx(s) = f(s), 


with solution 


_ f(s) 1 





(20.191) 
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where, as in Example 20.8.5, 


kK» .» & 








2 
=. = . 20.192 
C0= a =e 7B ( ) 
We identify the right-hand side of Eq. (20.191) as the product of two known transforms: 
1 1 1 
Le) = —L{F(t)}, = Lle CP sin ait}, 
m m (s+b/2m)+oa; 1 


where the second of these is a case of Eq. (20.158). 
Now applying the convolution theorem, Eq. (20.189), we obtain the solution to our 
original problem as an integral: 


t 
1 : 
XH=L'x(s)}= — | F(t —ze~@/2™™ sin wiz dz. (20.193) 
0 


We go on to consider two specific choices for the driving force F(t). We first take the 
impulsive force F(t) = Pd(t). Then 


X= Po bam sinayt. (20.194) 
ma 
Here P represents the momentum transferred by the impulse, and the constant P/m takes 
the place of an initial velocity X’(0). 

As a second case, let F(t) = Fo sinwt. We could again use Eq. (20.193), but a partial 
fraction expansion is perhaps more convenient. With 








Fow 
f(s) = eae 
Eq. (20.191) can be written in the partial fraction form, 
Fow 1 1 
hae + b/2m)2 + 0 
_ Foo Ee c's+d' | (20.195) 
m |s?+a°  (s+b/2m)2 + ot 


with coefficients a’, b’, c’, and d’ (independent of s) to be determined. Direct calculation 
shows for a’ and b’ 








1 b m 7) 
=o + = (oH wry, 
1 TD 0) 2|2 2, Mm. 9 22 
al 5 0 | a+ E08 aw) |. 


The terms of x(s) containing a’ and b’ lead upon inversion of the Laplace transform to 
the steady-state component of the solution: 


Fo 


X(t) = 
(t) [b202 + m2(w2 — w)2]1/2 





sin(wt — @), (20.196) 
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where 
P bo 
ang = er pe 
Differentiating the denominator, we find that the amplitude has a maximum when w = 2, 
with 
b2 ‘5 b2 
2m2 1 4m?" 
This is the resonance condition.!* At resonance the amplitude becomes Fo/bw,, showing 
that the mass m goes into infinite oscillation at resonance if damping is neglected (b = 0). 
This calculation differs from those used for the determination of transfer functions (com- 
pare Example 20.5.1) in that a steady-state solution at a fixed frequency is not assumed. 
Use of the Laplace transform (rather than the Fourier transform) permits solution for tran- 
sient as well as steady-state components of the solution. The transients, which we will not 
work out in detail, arise from the terms of Eq. (20.195) involving c’ and d’. These terms 
contain the quantity (s + b/2m) in the denominator, and its presence will generate terms 
of the inverse transform that contain the exponential factor e—"t/2m Tn other words, these 
terms describe exponentially decaying transients. 
It is worth noting that we have had three different characteristic frequencies: 





os = wh (20.197) 


sis : : b? 
Resonance for forced oscillations with damping: @5 = a — m2? 
m 
b2 
Free oscillation frequency, with damping: wr = wp — Fpl? 
m 
oo . k 
Free oscillation frequency, no damping: ws =—. 
m 
These frequencies coincide only if the damping is zero. a 


Recall that Eq. (20.190) is our ODE for the response of a dynamical system to an arbi- 
trary driving force. The final response clearly depends on both the driving force and the 
characteristics of our system. This dual dependence is separated in the transform space. In 
Eq. (20.191) the transform of the response (output) appears as the product of two factors, 
one describing the driving force (input) and the other describing the dynamical system. 
This is a factorization similar to that we found when discussing the use of Fourier trans- 
forms in signal-processing applications in Section 20.5. 


Exercises 


20.9.1 


From the convolution theorem show that 
t 


Eehnsp [ Feoas , 
Ss 
0 
where f(s) =L{F(t)}. 


The amplitude (squared) has the typical resonance denominator (the Lorentz line shape), found in Exercise 20.2.8. 
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20.9.2 If F(t) =r% and G(t)=1t?,a>—-1,b>—-1, 


(a) Show that the convolution F * G is given by 
1 
Px ea wih 
0 
(b) By using the convolution theorem, show that 
1 





a a!b! 
1- dy = —————_ =B 156+), 
po“ y) dy (@tb+i! (a+ 1,b+1) 
0 
where B is the beta function. 
20.9.3. Using the convolution integral, calculate 
= Ceb 
(s? + a?)(s? + b?) J’ , 


20.9.4 Anundamped oscillator is driven by a force Fo sinwt. Find the displacement X (t) as a 
function of time, subject to initial conditions X (0) = X’(0) = 0. Note that the solution 
is a linear combination of two simple harmonic motions, one with the frequency of the 
driving force and one with the frequency wo of the free oscillator. 





2s 
(2) Wo 


Fi 
ANS. X(t)= wae ( sin wot sino) 
WO 


20.10 INVERSE LAPLACE TRANSFORM 


Bromwich Integral 


We now develop an expression for the inverse Laplace transform £7! appearing in the 
equation 


FQ)=L7' {f(s}. (20.198) 


One approach lies in the Fourier transform, for which we know the inverse relation. There 
is a difficulty, however. Our Fourier transformable function had to satisfy the Dirichlet 
conditions. In particular, we required that in order for g(w) to be a valid Fourier transform, 


fina (eh 0. (20.199) 
@—->co 
so that the infinite integral would be well defined.'> Now we wish to treat functions F(t) 


that may diverge exponentially. To surmount this difficulty, we extract an exponential 
factor, e*', from our (possibly) divergent F(t) and write 


F(t) =e"'G(t). (20.200) 


13 We made an exception to deal with the delta function, but even in that case g(@) was bounded for all w. 
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If F(t) diverges as e*’, we require 6 to be greater than a so that G(t) will be convergent. 
Now, with G(t) = 0 for t < 0 and otherwise suitably restricted so that it may be represented 
by a Fourier integral, as in Eq. (20.22), we have 


1 Co Co 
GO) = 7 ] eit du f Gwen" a. (20.201) 
IU 
—oo 0 


Inserting Eq. (20.201) into Eq. (20.200), we have 


Bt CO CO 
F®)= = / e!" du / F(vye Pe dv. (20.202) 
JU 
—oo 0 


We now make a change of variable to s = 6 + iu, causing the integral over v in 
Eq. (20.202) to assume the form of a Laplace transform: 


CO 


/ F(vje *’dv= f(s). 
0 


The variable s is now complex, but must be restricted to Jte(s) > 6 in order to guarantee 
convergence. Note that the Laplace transform has extended a function specified on the 
positive real axis onto the complex plane, ie s > p.'4 

We now need to rewrite Eq. (20.202) using the variable s in place of u. The range 
—oo <u <0 corresponds to a contour in the complex plane of s, which is a vertical line 
from 6 — ico to B + ico; we also need to substitute du = ds/i. Making these changes, 
Eq. (20.202) becomes 


\ B+ioo 
Fj) =— / e' f (s)ds. (20.203) 
201i 
B—ioo 


Here is our inverse transform. The path has become an infinite vertical line in the complex 
plane. Note that the constant 8 was chosen so that f(s) would be nonsingular for s > . It 
can be shown that the nonsingularity of f(s) extends to complex s provided that Ne s > B, 
so the integrand of Eq. (20.203) can have singularities only to the left of the integration 
path. See Fig. 20.20. 

The inverse transformation given by Eq. (20.203) is known as the Bromwich integral, 
although sometimes it is referred to as the Fourier-Mellin theorem or Fourier-Mellin 
integral. This integral may now be evaluated by the regular methods of contour integration 
(Chapter 11). If t > 0 and f(s) is analytic except for isolated singularities (and no branch 
points), and is also small at large |s|, the contour may be closed by an infinite semicircle 
in the left half-plane that does not contribute to the integral. Then by the residue theorem 
(Section 11.8), 


F@tj)= (residues included for Ne s < B). (20.204) 


14For a derivation of the inverse Laplace transform using only real variables, see C. L. Bohn and R. W. Flynn, Real variable 
inversion of Laplace transforms: An application in plasma physics, Am. J. Phys. 46: 1250 (1978). 
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Possible singularities 
of e“ f(s) 








FIGURE 20.20 Possible singularities of e* f(s). 


It is worth mentioning that in many cases of interest f(s) may become large in the left half- 
plane or have branch points, and evaluation of the Bromwich integral may then present 
significant challenges. 

Possibly this means of evaluation with Jte s ranging through negative values seems 
paradoxical in view of our previous requirement that Jte s > 6. The paradox disappears 
when we recall that the requirement Sie s > 6 was imposed to guarantee convergence 
of the Laplace transform integral that defined f(s). Once f(s) is obtained, we may then 
proceed to exploit its properties as an analytic function in the complex plane wherever we 
choose. 

Perhaps a pair of examples may clarify the evaluation of Eq. (20.203). 


Example 20.10.1 INVERSION VIA CALCULUS OF RESIDUES 


If f(s) =a/(s? — a”), then the integrand for the Bromwich integral will be 


ae™' ae™' 


st = — 
e [I> = (s+a)(s—a) 





(20.205) 





From the form of Eq. (20.205), we see that this integrand has poles at s = +a, and the 
value of 6 for the integral must be larger than |a|. Since these are simple poles, it is easy to 
verify that the residue at s = a must be e“' /2, while the residue at s = —a will be —e~ /2. 
The form of the integrand also permits us to close the contour in the left half-plane. We 
find, in accord with Eq. (20.204), 


: 1 at —at 7 
Residues = 5 (e“ —e “)=sinhat = F(t). (20.206) 


Equation (20.206) is in agreement with entry #7 of our table of Laplace transforms, Table 
20.1. = 
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Example 20.10.2 — MULTIREGION INVERSION 


If f(s) = (1 —e—®)/s, the Bromwich integral then has integrand 


st _ (=) 
f(s) = e"(—"_ ], (20.207) 


and the possibilities for closing the contour depend on the relative magnitudes of t and a. 

Considering first t > a, we may close the contour for the Bromwich integral in the left 
half-plane without changing its value. Our integrand is an entire function (analytic every- 
where in the finite s-plane; note that the s in the denominator cancels when the numerator 
is expanded in a Maclaurin series). Since no singularities are enclosed, we conclude that 
fort >a, F(t)=0. 

For ¢ in the range 0 < t <a, a different situation is encountered. Expanding the inte- 
grand into the two terms 


est est—a) 


Ss Ss 





’ 


we see that the first becomes small in the left half-plane (but large in the right half-plane), 
while the second terms behaves in an opposite fashion (large in the left half-plane, small 
in the right). The obvious solution is to use different contours for the two terms, each of 
which is individually singular, with a pole at s = 0. We therefore close the contour for the 
first term in the left half-plane, but close that for the second term in the right half-plane. 
Since the vertical portion of the contour is at Jte s = 6 > 0, we see that the integral of the 
first term encloses the singularity, while the integral of the second term does not. Therefore 
the first integral will have a value equal to the residue of the integrand at the singularity 
(this residue is 1), while the second integral will vanish. These contours are illustrated in 
Fig. 20.21. 

Finally, for t < 0, the entire integrand becomes small in the right half-plane, the contour 
(for the entire integrand) surrounds no singularities, and the integral is zero. Summarizing 
these three cases, 


0, t <0, 
F(t)=u(t)-u(t—a)=341, O<t<a, (20.208) 
0, t>a, 
a step function of unit height and length a (Fig. 20.22). | 


= 
= 











FIGURE 20.21 Contours for Example 20.10.2. 
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FiGURE 20.22  Finite-length step function u(t) — u(t —a). 


Two general comments may be in order. First, these two examples hardly begin to show 
the usefulness and power of the Bromwich integral. It is always available for inverting a 
complicated transform when the tables prove inadequate. 

Second, this derivation is not presented as a rigorous one. Rather, it is given more as a 
plausibility argument, although it can be made rigorous. The determination of the inverse 
transform is somewhat similar to the solution of a differential equation. It makes little 
difference how you get the inverse transform. Guess at it if you want. It can always be 
checked by verifying that 


L{F()} = f(s). 


Two alternate derivations of the Bromwich integral are the subjects of Exercises 20.10.1 
and (20.10.2). 


Exercises 


20.10.1 Derive the Bromwich integral from Cauchy’s integral formula. 


Hint. Apply the inverse transform £~! to 


Bia 
1 
f(s) =—— lim I 9, 
271i a>00 S—Z 
B-ia 
where f(z) is analytic for Ne z > B. 
20.10.2 Starting with 
B+ioo 
1 St 
= ee ef (s)ds, 
20 
B—ioo 
show that by introducing 
[o,@) 


f(s) = / e “ F(z)dz 


0 





20.10.3 


20.10.4 


20.10.5 


20.10.6 
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we can convert our integral into the Fourier representation of a Dirac delta function. 
From this derive the inverse Laplace transform. 


Derive the Laplace transformation convolution theorem by use of the Bromwich inte- 
gral. 


Find 


(a) bya partial fraction expansion. 
(b) Repeat, using the Bromwich integral. 


k2 
—1 
. {az +k?) 


(a) by using a partial fraction expansion. 


Find 


(b) Repeat using the convolution theorem. 


(c) Repeat using the Bromwich integral. 
ANS. F(t)=1-—coskt. 


Use the Bromwich integral to find the function whose transform is f(s) = s~!/*. Note 
that f(s) has a branch point at s = 0. The negative x-axis may be taken as a cut line. 
See Fig. 20.23. 




















FiGURE 20.23 Contour for Exercise 20.10.6. 
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20.10.7 


20.10.8 


20.10.9 


Hint. A portion of the path needed to close the contour will yield nonzero contributions 
to the contour integral. These will need to be taken into account to get the proper value 
for the Bromwich integral. 


ANS. F(t) = (at)7!/. 
Show that 
cot + yt = ow) 
by evaluation of the Bromwich integral. 


Hint. Convert your Bromwich integral into an integral representation of Jo(t). Fig- 
ure 20.24 shows a possible contour. 


Evaluate the inverse Laplace transform 
co {(s? _ ay Val 
by each of the following methods: 


(a) Expansion in a series and term-by-term inversion. 
(b) Direct evaluation of the Bromwich integral. 


(c) Change of variable in the Bromwich integral: s = (a/2)(z + z7!). 
Show that 
1 
cS {=| =-—Int-y, 
S 


where y = 0.5772... is the Euler-Mascheroni constant. 

















FiGURE 20.24 A possible contour for the inversion of Jo(t). 
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20.10.10 Evaluate the Bromwich integral for 


f(s)= 


Ss 
20.10.11 Heaviside expansion theorem. If the transform f(s) may be written as a ratio 

g(s) 

h(s)’ 

where g(s) and h(s) are analytic functions, with h(s) having simple, isolated zeros at 
5 = 5;, Show that 


f= 





— nr fsa@ | _ BSI) gt 
ss bol aa 


Hint. See Exercise 11.6.3. 


20.10.12 Using the Bromwich integral, invert f(s) = s~7e~. Express F(t) = £7! {f(s)} in 
terms of the (shifted) unit step function u(t — k). 


ANS. F(t)=(t—k)u(t—k). 
20.10.13 You have a Laplace transform: 


1 
Ss) = —————., ab 
Fs) (s +a)(s +b) 7 
Invert this transform by each of three methods: 
(a) Partial fractions and use of tables, 
(b) Convolution theorem, 
(c) Bromwich integral. 
et —eu 
ANS. F(t) = —————.,, aF# b. 
a-—b 


Additional Readings 


Abramowitz, M., and I. A. Stegun, eds., Handbook of Mathematical Functions with Formulas, Graphs, and 
Mathematical Tables (AMS-55). Washington, DC: National Bureau of Standards (1972), reprinted, Dover 
(1974). Chapter 29 contains tables of Laplace transforms. 


Champeney, D. C., Fourier Transforms and Their Physical Applications. New York: Academic Press (1973). 
Fourier transforms are developed in a careful, easy-to-follow manner. Approximately 60% of the book is 
devoted to applications of interest in physics and engineering. 

Erdelyi, A., W. Magnus, F. Oberhettinger, and F. G. Tricomi, Tables of Integral Transforms, 2 vols. New York: 
McGraw-Hill (1954). This text contains extensive tables of Fourier sine, cosine, and exponential transforms, 
Laplace and inverse Laplace transforms, Mellin and inverse Mellin transforms, Hankel transforms, and other 
more specialized integral transforms. 

Hamming, R. W., Numerical Methods for Scientists and Engineers, 2nd ed. New York: McGraw-Hill (1973), 
reprinted, Dover (1987). Chapter 33 provides an excellent description of the fast Fourier transform. 

Hanna, J. R., Fourier Series and Integrals of Boundary Value Problems. Somerset, NJ: Wiley (1990). This book 
is a broad treatment of the Fourier solution of boundary value problems. The concepts of convergence and 
completeness are given careful attention. 
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Jeffreys, H., and B. S. Jeffreys, Methods of Mathematical Physics, 3rd ed. Cambridge: Cambridge University 
Press (1972). 


Krylov, V. I, and N. S. Skoblya, Handbook of Numerical Inversion of Laplace Transform (translated by 
D. Louvish). Jerusalem: Israel Program for Scientific Translations (1969). 


Lepage, W. R., Complex Variables and the Laplace Transform for Engineers. New York: McGraw-Hill (1961); 
Dover (1980). A complex variable analysis that is carefully developed and then applied to Fourier and Laplace 
transforms. It is written to be read by students, but intended for the serious student. 


McCollum, P. A., and B. F. Brown, Laplace Transform Tables and Theorems. New York: Holt, Rinehart and 
Winston (1965). 

Miles, J. W., Integral Transforms in Applied Mathematics. Cambridge: Cambridge University Press (1971). This 
is a brief but interesting and useful treatment for the advanced undergraduate. It emphasizes applications rather 
than abstract mathematical theory. 


Morse, P. M., and H. Feshbach, Methods of Theoretical Physics. New York: McGraw-Hill (1953). Parseval’s 
relations are derived independently of the inverse Fourier transform in Section 4.8 of this comprehensive, but 
difficult text. 


Papoulis, A., The Fourier Integral and Its Applications. New York: McGraw-Hill (1962). This is a rigorous 
development of Fourier and Laplace transforms and includes extensive applications in science and 
engineering. 

Roberts, G. E., and H. Kaufman, Table of Laplace Transforms. Philadelphia: Saunders (1966). 


Sneddon, I. N., Fourier Transforms. New York: McGraw-Hill (1951), reprinted, Dover (1995). A detailed com- 
prehensive treatment, this book is loaded with applications to a wide variety of fields of modern and classical 
physics. 

Sneddon, I. N., The Use of Integral Transforms. New York: McGraw-Hill (1974). Written for students in science 
and engineering in terms they can understand, this book covers all the integral transforms mentioned in this 
chapter as well as in several others. Many applications are included. 


Titchmarsh, E. C., Introduction to the Theory of Fourier Integrals, 2nd ed. New York: Oxford University Press 
(1937). 

Van der Pol, B., and H. Bremmer, Operational Calculus Based on the Two-sided Laplace Integral, 3rd ed. 
Cambridge, UK: Cambridge University Press (1987). Here is a development based on the integral range —0o 
to +-oo, rather than the useful 0 to oo. Chapter V contains a detailed study of the Dirac delta function (impulse 
function). 


Wolf, K. B., Integral Transforms in Science and Engineering. New York: Plenum Press (1979). This book is a 
very comprehensive treatment of integral transforms and their applications. 


21.1 


CHAPTER 21 


INTEGRAL EQUATIONS 


INTRODUCTION 


With the exception of the integral transforms of Chapter 20, we have for the most part been 
considering relations between an unknown function g(x) and one or more of its deriva- 
tives. We now proceed to investigate equations containing the unknown function within an 
integral. As with differential equations, we shall confine our attention to linear relations, 
which are called linear integral equations. These integral equations are classified in two 
ways: 


e If the limits of integration are fixed, we call the equation a Fredholm equation; if 
one limit is variable, it is a Volterra equation. 


e Ifthe unknown function appears only under the integral sign, we label it first kind. 
If it appears both inside and outside the integral, it is labeled second kind. 


Here are some examples of these definitions. In each of the following equations, g(t) is 
an unknown function whose value we seek. K (x, t), which we call the kernel, and f(x) 
are assumed to be known. When f(x) = 0, the equation is said to be homogeneous. 

This is a Fredholm equation of the first kind, 


b 
f= | Ke.noar (21.1) 


Next we have a Fredholm equation of the second kind, which is an eigenvalue equation 
with A the eigenvalue, 


b 


v(x) = Fla) +2 f KE. Nwtde (21.2) 


a 
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Here we have a Volterra equation of the first kind, 
x 


‘j= / K(x, y(t) dt; (21.3) 


and a Volterra equation of the second kind, 
x 
go) = f(x) + / K(x, Ny()dt (21.4) 
a 


Why do we bother about integral equations? After all, the differential equations have 
done a rather good job of describing our physical world so far. However, there are several 
reasons for introducing integral equations. 

First, we have placed considerable emphasis on the solution of differential equations 
subject to particular boundary conditions. For instance, the boundary condition at r = 
0 determines whether the Neumann function Y,(7) is present when Bessel’s equation is 
solved. The boundary condition for r — oo determines whether /,(r) is present in our 
solution of the modified Bessel equation. To the contrary, an integral equation relates the 
unknown function not only to its values at neighboring points (derivatives) but also to 
its values throughout a region, including the boundary. In a very real sense the boundary 
conditions are built into the integral equation rather than imposed at the final stage of the 
solution. It will be seen later in this section that if we construct an integral equation that is 
equivalent to a differential equation with its boundary conditions, the form of that integral 
equation depends on the boundary conditions. 

A second feature of integral equations is that their compact and completely self- 
contained form may turn out to be a more convenient or powerful formulation of a problem 
than a differential equation and its boundary conditions. Mathematical problems such as 
existence, uniqueness, and completeness may often be handled more easily and elegantly 
in integral form. And finally, whether or not we like it, there are problems, such as some 
diffusion and transport phenomena, that cannot be represented by differential equations. If 
we wish to solve such problems, we are forced to handle integral equations. 


Example 21.1.1 |= MomeNTUM REPRESENTATION IN QUANTUM MECHANICS 


The Schrédinger equation (in ordinary space representation) for a particle of mass m sub- 
ject to a potential V(r) is 


h2 
=v vO Vin)w(r) = Ev(n), (21.5) 


and we previously found, extending the 1-D result from Eq. (20.97), that in momentum 
space the equivalent equation (for the Coulomb potential in hartree atomic units) is 


Kok : AT ok) Pk = Eo(k 21.6 





This is an integral-equation eigenvalue problem. Note that the kernel of Eq. (21.6) is a 
function of k — k’; this functional dependence, which arises from the convolution theorem, 
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is typical of an ordinary potential in which the direct-space wave function is multiplied by 


a function that depends only on position. | 


Transformation of a Differential Equation into an 
Integral Equation 

Often we find that we have a choice. The physical problem may be represented by a dif- 
ferential or an integral equation. Let us assume that we have the differential equation and 


wish to transform it into an integral equation. Starting with a linear second-order ordinary 
differential equation (ODE), 


y" + A(x)y’ + Bx)y = g(x), (21.7) 
with initial conditions 
ya)=yo, y'(a)=y, 
we integrate to obtain 


yiy==fawyoar— f Boras f goar+y, 


a a a 


Integrating the first integral on the right by parts yields 


y= -aeree) = f [BQ -a'o] oars f e@ar+ A@y +. 


a a 


Integrating a second time, we obtain 


x 


yoy=-f A@yodr- f auf [Bo -ao]yoat 


a 


+f du f gar +[A@y +] (¢— 4) +50. (21.8) 


To transform this equation into a neater form, we use the relation 


few f roam f pera f du= [o-opoa (21.9) 
a a a t a 
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Applying this result to Eq. (21.8), we obtain 


x 


yoy == [ (Am + -9[ BO -a'@]) yoar 


+ fo@-nematt[A@m+y]@-a +. @L-10) 


If we now introduce the abbreviations 


K@)=¢= x»[ BO = A) | ~ A(t), 


fe) = i (xg @at+[ A@y0 + 6] — a) +90, 
Eq. (21.10) becomes 


yoy = Foyt f Kony ar, (21.11) 


which is a Volterra equation of the second kind. Note that f(x) in Eq. (21.11) has a form 
that includes the initial conditions from the original differential equation. 

Another method for obtaining an integral equation equivalent to a differential equa- 
tion plus its boundary conditions was presented in Section 10.1, where we found that the 
Green’s function for a differential equation appeared as the kernel of the equivalent integral 
equation. 


Example 21.1.2 Linear OsciLLATOR EQUATION 
Let’s find an integral equation equivalent to the linear oscillator equation 
y’+@°y=0 (21.12) 
with boundary conditions 
y(0) = 0, y(0)=1. 
This corresponds to Eq. (21.7) with 
A(x) =0, B(x) =o”, g(x) =0. 


Substituting into Eq. (21.10), we find that the integral equation becomes 


yoysx+o? [nye (21.13) 
0 
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This integral equation, Eq. (21.13), is equivalent to the original differential equation 
plus the initial conditions. A check shows that each form is indeed satisfied by y(x) = 
(1/@) sinwx. 

Let us reconsider the linear oscillator equation, Eq. (21.12), but now with the boundary 
conditions 


y(0) =0, y(b) =0. 


Since y’(0) is not given, we must modify the procedure. The first integration gives 


x 
y= -o | ydx + y'(0). 
0 
Integrating a second time and again using Eq. (21.9), we have 
x 
y=-0 [pyar try, (21.14) 
0 
To eliminate the unknown y’(0), we now impose the condition y(b) = 0. This gives 
b 
wo [ &—ny(ar= by). 
0 
Substituting this back into Eq. (21.14), we obtain 


x b 
sey= -o [ow —Ny()dt +0? Jo — ty(t) dt. 
0 0 


Now let us break the interval [0, b] into two intervals, [0, x] and [x, b]. Since 





ee 1% 
5! th—(x Sara x), 


we find 
x b 
t 
yoy =e? | Pb-oy dr +e? [ 2o-nymat (21.15) 
0 x 
Finally, if we define the kernel 
t 
—(b—x), t<x, 


Kei= : (21.16) 
(b-1),  t>x, 
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we have 
b 
yoy =o? f Kenya, (21.17) 
0 


a homogeneous Fredholm equation of the second kind. 
Our new kernel, K (x, f), illustrated in Fig. 21.1, has some interesting properties. 


1. Itis symmetric, K (x,t) = K(t,x). 
2. Itis continuous, in the sense that 


S@ 
rai 





="(-1) 
~ b 





t=x t=x 


3. Its derivative with respect to ¢ is discontinuous. As f increases through the point 
t =x, there is a discontinuity of —1 in 0K (x, t)/dt. 


Comparing with the discussion in Section 10.1, we identify K (x, t) as the Green’s func- 
tion for this ODE with the specified boundary conditions. Note in particular Eq. (10.30), 
which corresponds exactly to what was found here. | 


The above example shows how the initial or boundary conditions play a decisive role in 
the conversion of a linear second-order ODE into an integral equation. Summarizing, 


If we have initial conditions (only one end of our interval), the differen- 
tial equation transforms into a Volterra integral equation. But if we have a 
boundary value problem (boundary conditions at both ends of our inter- 
val), the differential equation leads to a Fredholm-type integral equation 
with a kernel that will be the Green’s function appropriate to the given 
boundary conditions. 


In closing, we call attention to the fact that the reverse transformation (integral equation 
to differential equation) is not always possible. There exist integral equations for which no 
corresponding differential equation is known. 








FIGURE 21.1 Kernel, Eq. (21.16), for linear oscillator boundary-value problem. 





Exercises 


21.1.1 


21.1.2 


21.1.3 


21.1.4 
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Starting with the ODE, integrate twice and derive the Volterra integral equation corre- 
sponding to 


(a) y"(x)—y(x)=0;  y(0)=0, yO) =1. 
ANS. y= [@-ny@artx. 
0 


(b) y"@)-y@)=0; yO)=1, y(0)=-1. 


x 


ANS. y= f @=Dy@at—x41. 
0 


Check your results with Eq. (21.11). 


Starting with the given answers of Exercise 21.1.1, differentiate and recover the original 
ODEs and the boundary conditions. 


x 


Given g(x) =x — fo —x)g(t)dt, 
0 


solve this integral equation by converting it to an ODE (plus boundary conditions) and 
solving the ODE (by inspection). 


Show that the homogeneous Volterra equation of the second kind 


x 


W(x)= af K(x, t)w(t) dt 
0 
has no solution (apart from the trivial solution w = 0). 


Hint. Develop a Maclaurin expansion of w(x). Assume that w(x) and K (x, f) are dif- 
ferentiable with respect to x as needed. 


21.2 SOME SPECIAL METHODS 


It is well known that general methods are available both for differentiating functions and 
(compare Chapters 7 and 9) for solving linear differential equations, while there is no 
general direct method for evaluating integrals. Integrations are carried out using a variety 
of tools of limited applicability, and the process is ultimately one of pattern recognition 
and the application of experience. Similar observations apply to the solution of integral 
equations. We consider here some special methods that work when the integral equation 
under study has suitable characteristics. 
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When the kernel of an integral equation (and its integration limits) match the specification 
of an integral transform for which we have an inversion formula, we can use that identifi- 
cation to solve the integral equation. Formulas based on four integral transforms are listed 
here for reference, in each case with f(x) a known function and g(x) to be determined. 


If our integral equation is f(x) = Tz fee e'*' y(t) dt, then its solution is 


CO 
1 fs 
x) = —— e '*' f(t)dt (Fourier transform). 21.18 
6) = = femroa ( ) (21.18) 
—00 


If our integral equation is f(x) = ta e *' g(t) dt, then its solution is 


y+ioo 
g(x) = = i e™' f(t)dt (Laplace transform). (21.19) 
y—ioo 
If our integral equation is f(x) = 1h t*—! g(t) dt, then its solution is 
y+ioo 
g(x) = _ i x 'f(t)dt (Mellin transform). (21.20) 
y—ioo 
If our integral equation is f(x) = i ty(t) Jy (xt) dt, then its solution is 
00 
g(x) = if tf(t)J,(xt)dt (Hankel transform). (21.21) 
0 


Note that these formulas can also be applied “in reverse,” i.e., with g(x) known and f(x) 
to be determined. This observation, however, is of somewhat limited utility since nothing 
significantly new appears for the inverse Fourier and Hankel transforms, while the integra- 
tion limits for the inverse Laplace and Mellin transforms make them unlikely to appear in 
an integral equation. 

Actually the usefulness of the integral transform technique extends a bit beyond these 
four rather specialized forms. We illustrate with two examples. 


Example 21.2.1 FOURIER TRANSFORM SOLUTION 


Let’s consider a Fredholm equation of the first kind with a kernel of the general type 
k(x — tf), where k is a function (not a constant), 


[ee 


fa= / k(x — tlt) dt, (21.22) 


—oo 
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in which g(f) is our unknown function. Assuming that the needed transforms exist, we 
apply the Fourier convolution theorem, Eq. (20.71), to obtain 


CO 


f[@M= / K(w) ®(w)e da. (21.23) 


—co 


The functions K(@) and ®(q) are, respectively, the Fourier transforms of k(x) and g(x). 
Taking the Fourier transform of both sides of Eq. (21.23), the formula for which is 
Eq. (21.18), we find 


K(@)®(@) = x | f (x)el*dx = = (21.24) 


where F(q@) is the Fourier transform of f(x). Since ®(w) is the only unknown in 
Eq. (21.24), we may solve for it, obtaining 


ooo (21.25) 
J2n K(o) 
and, using the inverse Fourier transform, we have the solution to Eq. (21.22): 
00 
g(x) = =| oe “10x gy), (21.26) 
—00 


A rigorous justification of this result is presented by Morse and Feshbach (see Additional 
Readings). An extension of this transformation solution appears as Exercise 21.2.1. a 


Example 21.2.2 — GENERALIZED ABEL EQUATION 


The generalized Abel equation is a Volterra equation of the first kind: 





_ i g(t) : f(x) known, 
f@) =] G—n* dt, O<a<1, with es See (21.27) 
0 


Taking the Laplace transform of both sides of this equation, we obtain 


p(t) 
(x — t) 





L{fx)p=HLl / dtp =L{x “}L{p(x)}, 
0 
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the last step following by the Laplace convolution theorem, Eq. (20.186). Then, evaluating 
L{x~%} from entry 3 of Table 20.1, 


st LL FO} 


Le@l= a 


(21.28) 
In principle, our integral equation is solved, since all that remains is to take the inverse 
transform of Eq. (21.28). A clever way of obtaining the inverse transform proceeds as 
follows, with its initial step being to divide Eq. (21.28) by s.' We get 

1 sO fG. Li olga 

—L{y(x)} = = . 

Ss (1 —-a@) l(a) —a) 
Combining the gamma functions according to Eq. (13.23) and applying the Laplace con- 
volution theorem again, we discover that 


Siete SIN 70% / f@ dt 
Ss T 
0 


(x — t)1-@# 








Inverting with the aid of Entry 3 of Table 20.2, we get 


x 


_sinza ff (t) 
fomar=AE f ear, 
0 


0 





and finally, by differentiating, we have the solution to our generalized Abel equation: 


sinna d f f() 
x dxJ (x—t)l-¢ 
0 





g(x) = dt. (21.29) 


Generating-Function Method 


Occasionally, the reader may encounter integral equations that involve generating func- 
tions. Suppose we have the admittedly special case, 





1 
f@) =| ot) de: Sey, (21.30) 
(1 —2xt+x?2)!/2 
-—1 


where f(x) is known and g(r) is to be determined. 
We note two important features: 


1. (1—2xt+.x*)~!/? generates the Legendre polynomials. 
2. [-1, 1] is the orthogonality interval for the Legendre polynomials. 


I This division converts s! —~® which cannot be inverted when 0 < a < 1, into s~%, which is the transform of zal T(q@). 
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These features make it possible to expand the denominator in Legendre polynomials, sug- 
gesting that it may be useful also to represent y(t) as an expansion in these same functions. 
Thus, we introduce the expansions 


aA" =D rns" ot) = Yo an Pn(O. 


m=0 
Substituting these expansions into our integral equation, Eq. (21.30), 


£eo= Vane" [ron (dt = OY aus 2m 


n=0 m=0 n=0m=0 








Oe 
n x" 
: 21.31 
= 243 2n+1° ( ) 


If we now insert into Eq. (21.31) the Maclaurin series expansion for f(x), 


oo FMC 
reo=y 


n=0 





we may equate powers of x, reaching, for each n, 














FO) _  2an 
nto n+’ 
so the solution to our integral equation is 
2n+1 f™ 0) 
t)= Pr 21.32 
y(t) = » (1). (21.32) 


Similar results may be obtained with other generating functions (see the list in Table 12.1). 


This technique of expanding in a series of special functions is always avail- 
able. It is worth a try whenever the expansion is possible (and convenient) 
and the interval is appropriate. 


Separable Kernel 


We consider here the special case that the kernel of our integral equation is separable, in 
the sense that 


n 
K(x, t= >> Mj@)Nj(), (21.33) 
j=l 
where n, the upper limit of the sum, is finite. Such kernels are sometimes called degener- 
ate. Our class of separable kernels includes all polynomials and many of the elementary 
transcendental functions. For example, K (x, t) = cos(t — x) is separable: 


cos(t — x) =costcosx + sint sinx. 
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Integral equations with separable kernels have the desirable property that they can be 
related to eigenvalue equations and permit the application of methods of linear algebra. 

Let’s consider a Fredholm equation of the second kind, Eq. (21.2), with a separable 
kernel of the form given in Eq. (21.33). Inserting this formula for K (x, t) and bringing the 
summation outside the integral, we have 


i b 
p(s) = Fle) +2 Myx) f Nj Oo~at (21.34) 
j=l a 


We now see that the integral with respect to ¢ will for each j be a constant (with values 
that are currently not known): 


b 
/ Nj(t)p(t) dt =c;. (21.35) 
Hence Eq. (16.71) becomes 
G(x) = f(x) +2 cj Mj(x). (21.36) 


j=l 


Once the constants c; have been determined, Eq. (21.36) will give us g(x), the solution to 
our integral equation. Equation (21.36) further tells us that the form of g(x) will consist of 
f (x) plus a linear combination of the x-dependent factors in the separable kernel. 

We may find the c; by multiplying Eq. (21.36) by Nj (x) and integrating to eliminate the 
x-dependence. Use of Eq. (21.35) yields 


n 
ci =b +4) ajje;, (21.37) 
j=l 
where 
b b 
bi =| (x) f (x)dx, aij =| (x)Mj(x)dx. (21.38) 
a a 
It is perhaps helpful to write Eq. (21.37) in matrix form , with A = (aj;): 
b=c— dAc= (1 — Ale, (21.39) 
or 
c=(1—AA)"!b. (21.40) 


Equation (21.39) is equivalent to a set of simultaneous linear algebraic equations 





(1 — Aaq1)c1 — Aay2¢2 — ha13¢3 —--- =), 
—Aaz\c) + (Ll — Aa22)e2 — ha3¢3 — --- = bo, (21.41) 


—ha31C1 — 4a32C2 + (1 — Aa33)c3 —--- = 53, and soon. 
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If our integral equation is homogeneous, so f(x) = 0, then b = 0. To get a solution in that 
case, we set the determinant of the coefficients of cj equal to zero: 


[1 —AA| =0, (21.42) 


exactly as for any matrix eigenvalue problem. The roots of Eq. (21.42) yield our eigen- 
values. Substituting into (1 — AA)c = 0, we find the c; and then Eq. (21.36) gives our 
solution. 


Example 21.2.3 HOMOGENEOUS FREDHOLM EQUATION 


To illustrate this technique for determining eigenvalues and eigenfunctions of the homo- 
geneous Fredholm equation of the second kind, we consider 


1 
v(x) =a fi + x)(t)dt. (21.43) 
-1 


Writing the kernel of this equation as Mj (x) N1(t) + M2(x)N2(t), we have 
M\(x)=1, M(x) =x, 
Ni(t)=t, N2(t)=1. 

Using the notation of Eqs. (21.33) to (21.42), we find from Eq. (21.38): 


2 
a, = 422 = 0, a12= 3: ag3=2; bh =bo=0. 





Equation (21.42), our secular equation, becomes” 
; 2r 
~ 3 /=0. (21.44) 
—22 1 
Expanding, we obtain 
Ai? 3 
Lg nee (21.45) 








Substituting the eigenvalues 4 = +,./3/2 into Eq. (21.39), we have 


c2 


Cc 0. 21.46 
1+ a ( ) 
Finally, with the choice cy = 1, Eq. (21.36) gives the two solutions 
3 3 
gi(x)= Ba + Vin), i= se (21.47) 


2 This equation would look more like our usual secular equations if each row of the determinant were divided by A. Then we 
would have the secular equation in a familiar form, but with 1/A identified as the eigenvalue. 
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3 3 
nots) =- Ga V3, jon. (21.48) 
Since our equation is homogeneous, the normalization of g(x) 1s arbitrary. | 


If the kernel of an integral equation is not separable in the sense of Eq. (21.33), there is 
still the possibility that it may be approximated by a kernel that is separable. Then we can 
get the exact solution of an approximate equation, which we can treat as an approximation 
to the solution of the original equation. 


Exercises 


21.2.1. The kernel of a Fredholm equation of the second kind, 
foe) 
viy= fey ta f Ka.netndr, 
=CO 


is of the form k(x — t).? Assuming that the required transforms exist, show that 


(x) 1 / F(t)e"'*! dt 
x)= 7 
4 V20 1— J/271.K (t) 





F(t) and K (t) are the Fourier transforms of f(x) and k(x), respectively. 
21.2.2 (a) The kernel of a Volterra equation of the first kind, 
x 


fea) = [ Ke. nwtdr, 


0 


has the form k(x — t). Assuming that the required transforms exist, show that 





ytioo 
i F(S) 4s 
as ds, 
OG) = a ; me 
y—ico 


where F(s) and K(s) are, respectively, the Laplace transforms of f(x) and k(x). 


(b) In terms of the notation of part (a), show that the Volterra equation of the second 
kind, 


x 


rie fay +2 [ Kee noma, 


0 


3This kernel and a range 0 < x < oo are the characteristics of integral equations of the Wiener-Hopf type. Details will be found 
in Chapter 8 of Morse and Feshbach (1953); see the Additional Readings. 





21.2.3 


21.2.4 


21.2.5 


21.2.6 
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has solution 





ytioco 
1 F(s) ae 
= Sds. 
DON Fe / i=1K@ 
y—ioo 


Using the Laplace transform solution (Exercise 21.2.2), solve 
x 
@ o@ysx+ fo-ogwar 
0 
ANS. (x) =sinx. 
x 
0) oey=x- fr-xorndr. 
0 


ANS. g(x) =sinhx. 
Check your results by substituting back into the original integral equations. 


Reformulate the equations of Example 21.2.1 for integrals on the range (0, 00) using 
Fourier cosine transforms. 


Given the Fredholm integral equation, 


[ee 


oF = / e &—) ot) dt, 


—cC 
apply the Fourier convolution technique of Example 21.2.1 to solve for g(f). 


Solve Abel’s equation, 





fay= f 20, dt, O<a<l, 
0 


by the following method: 


(a) Multiply both sides by (z — x)*~! and integrate with respect to x over the range 
O<x <z. 


(b) Reverse the order of integration and evaluate the integral on the right-hand side 
(with respect to x) by recognizing it as a beta function. 


Note. 








sina 


| asi = Bil )=F@Pr —a)= 
J Ga eee TON ORTON 
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21.2.7 


21.2.8 


21.2.9 


21.2.10 


21.2.11 


Given the generalized Abel equation with f(x) = 1, 


= 
i= / Os eer 
(x — t)* 
0 





solve for y(t) and verify that g(t) is a solution of the given equation. 
sINTa 44 


ANS. y(t) = ———t® 
as 


A Fredholm equation of the first kind has a kernel e~*~' Y, 


Co 
pays fe paar 
—0oo 
Show that the solution is 


CO 


oi f™() 
CON ig apy An), 





in which H,,(x) is an nth-order Hermite polynomial. 


Solve the integral equation 





1 
g(t) 
f(x) | cee <x<il, 


for the unknown function g(r), if 
alse; 
Of@Warr. 


4s +1 4s +3 


ANS. @90O=—,—Pas), O)9O=—,Z 








Pos41(t). 


Find the eigenvalues and eigenfunctions of 
1 
vay =a fo -pinde 
-1 
Find the eigenvalues and eigenfunctions of 
Qn 
g(x)=r / cos(x — t)p(t) dt. 


0 


1 
ANS. 1 =A2=—, o(x)=Acosx+ Bsinx. 
cre 





21.2.12 


21.2.13 


21.2.14 


21.2.15 


21.2.16 
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Find the eigenvalues and eigenfunctions of 


1 
y(x) =a fx —t)’ y(t) dt. 
-1 


Hint. This problem may be treated by the separable-kernel method or by a Legendre 
expansion. 


Use the separable-kernel technique to show that 


4 


W(x) = i | cos sin ty (t) dt 
0 


has no solution (apart from y = 0). Explain this result in terms of separability and 
symmetry. 


1 
Given g(x) = a fa + xt)g(t) dt, 
0 


solve for the eigenvalues and the eigenfunctions by the separable-kernel technique. 


Knowing the form of the solutions of an integral equation can be a great advantage. For 
1 


g(x) = a fa + xt)g(t)dt, 
0 


assume g(x) to have the form 1 + bx. Substitute into the integral equation. Integrate 
and solve for b and i. 


The equation 


b 
age / K(x, )o(t)dt 


has a degenerate kernel K(x, t) = )7¥_, M(x) Ni(t). 
(a) Show that this integral equation has no solution unless f(x) can be written as 
n 
f@)=>0 AMG), 
i=1 
where the jf; are constants. 


(b) Show that to any solution g(x) we may add w(x), provided that (x) is orthogo- 
nal to all N; (x): 


b 
/ Ni(x)W(x)dx =0 for alli. 
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21.2.17. A Kirchhoff diffraction theory analysis of a laser leads to the integral equation 


21.3 


v(r2) = yf K (1, r2)v(1)dA. 


The unknown, u(r), gives the geometric distribution of the radiation field over one 
mirror surface; the range of integration is over the surface of that mirror. For square 
confocal spherical mirrors, the integral equation becomes 





~iyeikb an 
v(x2, y2) = 1b ff cemeehmven wandy, 


—a-—a 


in which b is the centerline distance between the laser mirrors. This can be put in a 
somewhat simpler form by the substitutions 


kx? 


2 
ey Kyi 
b a 


b 


ka® = 2a? 

2 2 

=n;, d —= —a’. 
" ” b Xb . 











(a) Show that the variables separate and we get two integral equations. 








(b) Show that the new limits, ta, may be approximated by too for a mirror dimen- 
sion a >A. 


(c) Solve the resulting integral equations. 


NEUMANN SERIES 


Many and probably most integral equations cannot be solved by the specialized techniques 
of the preceding section. Here we develop a rather general technique for solving integral 
equations. The method, due largely to Neumann, Liouville, and Volterra, develops the 
unknown function g(x) as a power series in A, where A is a given constant. The method is 
applicable whenever the series converges. 

We solve a linear integral equation of the second kind by successive approximations; 
let’s take as an example the Fredholm equation 


b 


(x)= Fry +a f KE. Nwtdr, (21.49) 


a 


in which f(x) 4 0. If the upper limit of the integral is a variable (Volterra equation), 
the following development will still hold, but with minor modifications. Let us make the 
following initial approximation to our unknown function: 


p(x) © pox) = fx). (21.50) 


This choice is not mandatory. If you can make a better guess, go ahead and guess. The 
choice here is equivalent to saying that the term of the equation containing the integral is 
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small relative to f(x). To improve this first crude approximation, we feed yg(x) back into 
the integral in Eq. (21.49), getting 


b 


vila) = feta f Ko. soade (21.51) 


a 


Substituting the new g(x) back into Eq. (21.49), we obtain a second approximation to 
p(x): 


b 
vals) = fey +2 f Kom fenrdn 


bb 


+2 ff Konkan) f(aydndn. 


a 


This process can be repeated indefinitely, defining after n steps the nth order approximation 


gn(x) =) Aluj(x), (21.52) 


i=0 
where 
uo(x) = f(x) 
b 


mi(ay= f Keen) flaydn 


a 


(21.53) 
b b 
wnt = ff Kom Ke.m)f(e)dndn 
aa 
b b b 
unt) = ffs f Kem KO.) KOntstn) On) diy dy. 
aa a 
We expect that our solution g(x) will be 
n ; 
g(x) = lim gp(x) = lim )~A'uj(x), (21.54) 
n—->oo n—-> oo 


i=0 


provided that our infinite series converges. 
We may conveniently check the convergence by the Cauchy ratio test, Section 1.1, not- 
ing that 


|A" un (x)| <A" - | flmax ae |b=al", 





1066 


Chapter 21 Integral Equations 


using | f|max to represent the maximum value of | f(x)| in the interval [a, b] and | K|max 
to represent the maximum value of |K (x, f)| in its domain in the xt-plane. A sufficient 
condition for convergence is 


|A| -|K|max + |b —a| <1. (21.55) 


Note that A|u,(max)| is being used as a comparison series. If it converges, our actual series 
must converge. If this condition is not satisfied, we may or may not have convergence, and 
amore sensitive test would be required to determine the convergence. Of course, even if the 
Neumann series diverges, there still may be a solution to our integral equation obtainable 
by another method. 

To gain more understanding of our iterative manipulation, we may find it helpful to 
rewrite the Neumann series solution, Eq. (21.54), in operator form. We start by rewriting 
Eq. (21.49) as 


gp=iKo+t f, 


where K represents the integral operator { id K (x, t)[ ]dt. Solving symbolically for gy, we 
obtain 


g=(1-AK) 'f. 


Binomial expansion leads to Eq. (21.54). The convergence of the Neumann series is a 
demonstration that the inverse operator (1 — 1K)! exists. 


Example 21.3.1 NEUMANN SERIES SOLUTION 


To illustrate the Neumann method, we consider the integral equation 


1 


g(x) =x+ ; fo — x)p(t) dt. (21.56) 
=] 


To start the Neumann series, we take 





go(x) = x. 
Then 
1 1 
= fe \tdt = : Lp Le i 
g(x =a+5 —x =x+5 a) x = 3" 
—|1 am 
Substituting gj (x) back into Eq. (21.56),we get 
1 1 
(x) +5 Jo year+ 5 fa Cia 
xX)=xX+= —Xx = —x)-dt=x+-- +. 
e 2 2 3 . 3 
-1 -1 
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Continuing this process of substituting back into Eq. (21.56), we obtain 


aap Sse 
X)=X SS Se 
i 33° 3 
and by mathematical induction (Section 1.4), 
n n 
grn(x) =x + Y(-D 13S =x YO(-D13. (21.57) 
s=1 s=1 
Letting n — ov, we get 
C= ae (21.58) 
x)=—x4+-. : 
eau aaa 


This solution can (and should) be checked by substituting back into the original equation, 
Eq. (21.56). 

It is interesting to note that our series converged easily even though Eq. (21.55) is not 
satisfied in this particular case. Actually Eq. (21.55) is a rather crude upper bound on 
A. It can be shown that a necessary and sufficient condition for the convergence of our 
series solution is that |A| < |A-|, where A, is the eigenvalue of smallest magnitude of the 
corresponding homogeneous equation (that with f(x) = 0). For this particular example, 
he = V3/2. Clearly, A = 5 <de. | 


The technique illustrated by the Neumann series occurs in a number of contexts in quan- 
tum mechanics. For example, one approach to the calculation of time-dependent perturba- 
tions in quantum mechanics starts with the integral equation for the evolution operator 


t 


Uéw=1= 5 f anvanuta.t. (21.59) 
10) 
Iteration leads to 
. t : 2 t ty 
CE wets ; f anv + (;) fal dtyV (ty) V(t) +e. (21.60) 
to to to 


The evolution operator is obtained as a series of multiple integrals of the perturbing poten- 
tial V(t), closely analogous to the Neumann series, Eq. (21.52). 

A second and similar relationship between the Neumann series and quantum mechanics 
appears when the Schrédinger wave equation for scattering is reformulated as an integral 
equation. See Example 10.2.2. The first term in a Neumann series solution is the incident 
(unperturbed) wave. The second term is the first-order Born approximation, Eq. (10.51). 

The Neumann method may also be applied to Volterra integral equations of the second 
kind, corresponding to replacing the fixed upper limit b in Eq. (21.49) by a variable, x. In 
the Volterra case the Neumann series converges for all 2 as long as the kernel is square 
integrable. 





1068 Chapter 21 Integral Equations 


Exercises 


21.3.1 


21.3.2 


21.3.3 


21.3.4 


Using the Neumann series, solve 
x 
(a) oxy = 1-2 f rar, 
0 
ANS. (a) g(x)=e™. 


x 


@). e@ax ; (t— x)o(t)dt, 


0 


O-oejen= / Or 
0 


Solve 
1 
woyaxt [Otanbind 
0 


by each of the following methods: 
(a) The Neumann series technique, 
(b) The separable-kernel technique, 
(c) Educated guessing. 


Solve 
g(x) =1+ 2 fx — t)g(t)dt 
0 


by each of the following methods: 
(a) Reduction to an ODE (find the boundary conditions), 
(b) The Neumann series, 


(c) The use of Laplace transforms. 


ANS. (x) =coshax. 


(a) In Eq. (21.59), take V = Vo, independent of t. Without using Eq. (21.60), show 
that Eq. (21.59) leads directly to 


U(t — to) = exp l-5¢ - vo] : 


(b) Repeat for Eq. (21.60) without using Eq. (21.59). 
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HILBERT-SCHMIDT THEORY 


Symmetrization of Kernels 


The Hilbert-Schmidt theory deals with linear integral equations of the Fredholm type with 
symmetric kernels: 


K (x,t) = K(t, x). (21.61) 


The symmetry is of great importance, both because we will find it leads to results parallel 
to those found for the Sturm-Liouville theory of differential equations, and also because 
many problems of physical relevance can be written as Fredholm integral equations with 
symmetric kernels. 

Before plunging into the theory, we note that some important nonsymmetric kernels can 
be symmetrized. If we have the equation 


b 
r= FQ22 i: K(x, t)p(t)o(t)dt, (21.62) 


the total kernel is actually K (x, t)o(t), clearly not symmetric if K (x, t) alone is symmetric. 
However, if we multiply Eq. (21.62) by /o(x) and substitute 


P(x)9(x) = W(x), 


we obtain 


b 


W(x) = Vp) f(x) +4 i: [ Ke. nve@e@ | winar, (21.63) 


a 


with a symmetric total kernel K (x, t)./p(x) p(t). 


Orthogonal Eigenfunctions 


We now focus on the homogeneous Fredholm equation of the second kind: 


b 
viy=a f Ko.nemat (21.64) 


a 
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We assume that the kernel K (x, t) is symmetric and real. Perhaps one of the first questions 
we might ask about the equation is: “Does it make sense?” or more precisely, “Does an 
eigenvalue i satisfying this equation exist?” This question can be answered in the affir- 
mative. Courant and Hilbert (in their work cited in the Additional Readings, chapter III, 
section 4) show that if K (x,t) is continuous, there is at least one such eigenvalue and 
possibly an infinite number of them. 

It is useful to recognize that Eq. (21.64) represents a linear-operator eigenvalue problem: 
The integral on its right-hand side converts g into (in general) some other function, which 
we can indicate symbolically by the equation 


b 
ee / K(x, N g(t) dt =Ky(x), (21.65) 


so our eigenvalue problem is 


1 
K(x) = 7) (21.66) 


We do not have to worry about the possibility that 2 = 0, since we can read directly from 
Eq. (21.64) that in that case the solution to our integral equation will be uniquely g(x) = 0. 
The integral operator XK is linear, since it is obviously true that 


K(agi(x) + bgr(x)) = akg (x) + bKg2(3). 


In addition, if we define the scalar product as an integral on the range (a, b): 


b 


(v9) = ; W* (x)o(a)dx, (21.67) 


a 


we then see that our requirement that the kernel K (x, f) be real and symmetric will make 
K a self-adjoint operator: 


b b b b 
WiKo) = i vo / Ko. deoat| ee / ar] i axke.nwoo| g(t) 


= (Kwly). oo 


The linearity and self-adjointness indicate that we can expect to confirm that K has the key 
properties of self-adjoint operators, namely that its eigenvalues are real and (except in the 
case of degeneracy) its eigenvectors are orthogonal. 

While the above constitutes a complete demonstration of the orthogonality of our 
solutions to the homogeneous Fredholm equation, let’s confirm these properties more 
explicitly. 
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We can start from the two equations, 


b 
1 
wie) = f Ko. ei(trar (21.69) 
; a 
; b 
pi) = f Kee. node (21.70) 
j J 
If we multiply Eq. (21.69) by 9; (x) and Eq. (21.70) by g(x) and then integrate with 
respect to x, the two equations become* 
: b b b 
a / 9° (x) gi (x)dx = / / K(x, Nps gi(a)dtdx, (21.71) 
' a aa 
; b b b 
z, | etereeodx= ff Ko. n9 @eycoaedx (21.72) 
a aa 


Since we have demanded that K (x,t) be real and symmetric, we may take the com- 
plex conjugate of Eq. (21.72) and then interchange the roles of x and f¢ in the integral, 
reaching 


b bb 


= | vcoojandx= ff Ke. ne coofndas. (21.73) 
J a aa 


Subtracting Eq. (21.73) from Eq. (21.71), we obtain 


b 
1 1 a _ 
( = £) J ejoreonas =0. (21.74) 


a 


Just as in our earlier derivation from Sturm-Liouville theory, we conclude that if i = j the 
integral in Eq. (21.74) is necessarily nonzero; so 1/A; = 1/A7, meaning that A; must be 
real. But if; #Aj;, 


b 
[ vienejcoax =o, Mi EA, (21.75) 


a 


proving orthogonality. The derivation can also be completed if K (x, t) is Hermitian, mean- 
ing that K (t, x) = K*(x, t). See Exercise 21.4.1. Since we are mostly concerned with real 


4We assume that the necessary integrals exist. For an example of a simple pathological case, see Exercise 21.4.4. 
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K, it is appropriate to assume also that ¢ is real, and for the remainder of this chapter we 
will often omit the complex conjugate asterisks that occur, for example, in Eq. (21.75). 

If the eigenvalue A; is degenerate,’ the eigenfunctions for that particular eigenvalue 
may be orthogonalized by the Gram-Schmidt method (Section 5.2). Our orthogonal eigen- 
functions may, of course, be normalized, and we assume that this has been done. The 
result is 


b 
| 07 (x)pj(x)dx = 5;j. (21.76) 


a 


It can be shown that the eigenfunctions of our integral equations form a complete set,° 
in the sense that if a function g(x) can be generated by the integral 


g(x) = : K(x, h(t) dt, 


with h(t) a piecewise continuous function, then g(x) can be represented by a series of 
eigenfunctions, 


(oe) 


g(x) = >) dngn(x). (21.77) 


n=1 


The series in Eq. (21.77) can be shown to converge uniformly and absolutely. 
Let us extend this to the kernel K (x, t) by asserting that 


[ee 


K(x,t)= 0 angn(t), (21.78) 


n=1 


and dy, = a, (x). Substituting into the original integral equation, Eq. (21.64), and using the 
orthogonality integral, we obtain 


pi (x) = Ajaj (x). (21.79) 


Therefore, for our homogeneous Fredholm equation of the second kind, the kernel may be 
expressed in terms of the eigenfunctions and eigenvalues as 


Ss Gn (x) Pn (t) 
K«,)=) >. 21.80 
nas = (21.80) 
n=1 
Equation (21.80) is not actually a new result. In the Green’s function chapter, Section 
10.1, we identified K (x,t), there called G(x,t), as the Green’s function appearing in 
Eq. (10.30), with the expansion given in Eq. (10.14). However, it is possible that the 





5As for differential operators, if more than one distinct eigenfunction of Eq. (21.64) corresponds to the same eigenvalue, that 
eigenvalue is said to be degenerate. 
Fora proof of this statement, see Courant and Hilbert (1953), chapter III, section 5, in the Additional Readings. 
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expansion given by Eq. (21.80) may not exist. As an illustration of the sort of pathological 
behavior that may occur, you are invited to apply this analysis to 


(oe) 


vsy=a fe“ pnat 
0 


Compare Exercise 21.4.4. 

It should be emphasized that this Hilbert-Schmidt theory is concerned with the establish- 
ment of properties of the eigenvalues (real) and eigenfunctions (orthogonality, complete- 
ness), properties that may be of great interest and value. The Hilbert-Schmidt theory does 
not solve the homogeneous integral equation for us any more than the Sturm-Liouville 
theory for differential equations solved the ODEs. The solutions of the integral equation 
come by the application of techniques such as were introduced in Sections 21.2 and 21.3, 
or perhaps even by numerical methods. 


Inhomogeneous Integral Equation 


We now continue with the Hilbert-Schmidt theory by seeking solutions of the inhomoge- 
neous equation 


b 
(x)= Fla) +2 f KE. Nwtde (21.81) 


We assume that the solutions of the corresponding homogeneous integral equation are 
already known: 


b 
wala) =n f KC pale (21.82) 
a 
the solution ¢, (x) corresponding to the eigenvalue i,,. Note that at this point we are assum- 
ing nothing about A; it is a constant that has no specific relationship to the eigenvalues A, 


of the homogeneous integral equation. 
We expand both g(x) and f(x) in terms of this set of eigenfunctions: 


g(x) = Y> anGn(x) (a, unknown), (21.83) 
n=l 
£00) = Yo PngGn(x) (bn known). (21.84) 
n=1 


Substituting into Eq. (21.81), we obtain 


fore) 00 b 00 
Y> angn(x) = So bngn(x) + if K (x,t) Y> angn(t) dt. (21.85) 
n=1 


n=1 a n=1 
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By interchanging the order of integration and summation, we may evaluate the integral by 
Eq. (21.82), and we get 


[ee 


YS > dn Qn(x) = hate aS sata) = (21.86) 


n=1 n=1 n=1 


If we multiply by ¢; (x) and integrate from x = a to x = b, the orthogonality of our eigen- 
functions leads to 


ee eee a (21.87) 
Ki 
This can be rewritten as 

Xr 


aj = bj + hi. (21.88) 
= 


We now multiply Eq. (21.88) by g(x) and sum over i, giving 








poasorry? lt , 
b 
=f +222 f prnmnrae (21.89) 
i=l ai 


Here it is assumed that the eigenfunctions g;(x) are normalized to unity. Note that if 
f (x) = 0, there is no solution unless 4 is equal to one of the A;, thereby confirming that 
the homogeneous integral equation has only the solutions ¢j (x). 

In the event that A for the inhomogeneous equation, Eq. (21.81), is equal to one of 
the eigenvalues 1, of the homogeneous equation, our solution, Eq. (21.89), blows up. It 
can be shown that the inhomogeneous equation then has no solution unless the coefficient 
bp vanishes, meaning that there is no solution unless the inhomogeneous term f(x) is 
orthogonal to the eigenfunction g,. If the eigenvalue 4, is degenerate, there will be no 
solution unless f(x) is orthogonal to all the degenerate eigenfunctions. 

For the case that b» = 0, we can return to Eq. (21.87), which then reduces for ap to 


Ap = by + ap =4y, (21.90) 


which gives no information about ay. Note that if b, 4 0 this equation cannot be satisfied, 
a signal that a solution cannot be obtained. 

Under the assumption that by = 0 we now can rewrite Eq. (21.86), identifying its first 
two summations, respectively, as g(x) and f(x), separating the final summation into the 
single term @p@ (x) plus a sum over all n other than p, thereby reaching 


b 

CO n . 

G(x) = f(x) +apop + p>, 2) [ feovman (21.91) 
i=1 °7 BS 
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In this solution the a p remains as an undetermined constant,’ and the prime indicates that 
i = p is to be omitted from the sum. 

It is of interest to relate Eq. (21.89) to what might be expected if we tried to develop a 
similar equation by Green’s-function methods. To do this, we start by rewriting Eq. (21.81) 
as an operator equation of the form 


f(x) 
1. 





1 
Ko(x) 7°) = (21.92) 


where K is the operator that we introduced in Eq. (21.65). Next, we note that, from 
Eq. (21.82), the g, are eigenfunctions of K with eigenvalues 1/1): 


Pn) 


rn 


Ken(x) = (21.93) 


Then, applying Eq. (10.39), the Green’s function of the entire left-hand side of Eq. (21.92) 
will be (assuming ¢ is real): 





nN n An 
Ciyay SEU ay = in @ 





n Ag! — a7! n An 
= —A Dona rgntt) +22 An 
Qn (X)Gn(t) 
= —)d(t — x) > a (21.94) 


To reach the last line of Eq. (21.94) we used the eigenfunction expansion of the delta 
function, Eq. (5.27). Applying this Green’s function to the right-hand side of Eq. (21.92), 
we get 


b 


1 
ie as i: G(x, t) f(t) dt 


b 


1 Qn (X)Gn(t) 
fl S(t — x) os ot roa 


a 








b 
Pn (x) 
hn A [ommroar, (21.95) 


f)tay> 


which agrees with Eq. (21.89). 


7 This is like the inhomogeneous linear ODE. We may add to its solution any constant times a solution of the corresponding 
homogeneous ODE. 
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Example 21.4.1 INHOMOGENEOUS FREDHOLM EQUATION 


Let’s seek solutions to the inhomogeneous Fredholm equation 
1 
vaya tn fot nocar (21.96) 
-1 


for the two A values A = 1 and 4 = 3/2. The corresponding homogeneous equation, 
treated in Example 21.2.3, has solutions only for the two eigenvalues +./3/2. In norma- 
lized form, they are: 
V3 V3 I V3 V3 1 
AL= 1= («+ ) A2=— ( ): 


Pee car, oe ea 


Taking first A = 1, which is not an eigenvalue of the homogeneous equation, we have 


1 
A [Pawar 
ot 





2 
ard 
g(x) =x + 2a; 





=x + 





4 SOA) tas ha 
3 








J3 ‘ 2 
= - = 
J3 1 ' 
2 (: =) V3 [3 1 
+ V3 5) iE (\-=)a 
So =] 
5 6 
=x — 2x4). (21.97) 


Continuing now to 4 = 3/2, we note that it is the eigenvalue A; of the homogeneous 
integral equation. That means the integral equation will have no solution unless (g1| f) = 0. 


For the present problem, 
1 
1] (+z)* 
= —_— x + — )x-dx £0, 
(lf =—= | B # 


so our integral equation will have no solution for 1 = /3/2. If in spite of this observation 
we attempted to generate a solution using Eq. (21.91), the function g(x) we obtained would 
not satisfy the integral equation, irrespective of the value we might choose to assign to ap. 
The immediate reason we cannot obtain a solution is that the integral 


1 1 
fo +x) f (t)dt = fo +x) dt= : 
-1 -1 
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evaluates to a quantity that cannot be represented as a linear combination of the eigenfunc- 
tions g; other than g, (in the present case, this means that 2/5 is not proportional to @2). 
There is therefore no way to add an additional component to f(x) to obtain a cancellation 
of the 2/5. | 


Exercises 


21.4.1 


21.4.2 


21.4.3 


21.4.4 


In the Fredholm equation 
b 
vis) =a f Kor npat, 
a 
assume that the kernel K (x, t) is self-adjoint or Hermitian: 
K (x,t) = K* (t,x). 


Extend the analysis of the present section to show that 


(a) the eigenfunctions are orthogonal, in the sense that 
b 
| vncoencoax =o. mHA#~N( Am # An). 
a 


(b) the eigenvalues are real. 


(a) Show that the eigenfunctions of Exercise 21.2.12 are orthogonal. 
(b) Show that the eigenfunctions of Exercise 21.2.14 are orthogonal. 
Use the Hilbert-Schmidt method to solve the inhomogeneous integral equation 
1 
g(x)=x+5 Jo + x)g(t) dt. 
-1 
The corresponding homogeneous integral equation was treated in Example 21.2.3. 


Note. The application of the Hilbert-Schmidt technique here is somewhat like using 
a shotgun to kill a mosquito, especially when the equation can be solved quickly by 
expanding in Legendre polynomials. 


The Fredholm integral equation 


[ee 


way=a fe“ omar 
0 
has an infinite number of solutions, of which one is 


g(x) =x 2, paw l/?, 


Verify that this is a solution and that it is not normalizable. 
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21.4.5 


21.4.6 


21.4.7 


Note. A basic reason for this anomalous behavior is that the range of integration is 
infinite, making this a “singular” integral equation. Note also that a series expansion of 
the kernel e~*’ would permit a solution by the separable-kernel method (Section 21.2), 
except that the series is infinite. This observation is consistent with the fact that this 
integral equation has an infinite number of eigenvalues and eigenfunctions. 


Given 
1 


yayaata f xt y(t) dt: 
0 


(a) Determine y(x) as a Neumann series. 


(b) Find the range of 4 for which your Neumann series solution is convergent. Com- 
pare with the value obtained from 


|A| -|K|max < 1. 
(c) Find the eigenvalue and the eigenfunction of the corresponding homogeneous inte- 
gral equation. 
(d) By the separable-kernel method show that the solution is 


3x 
3-1 
(e) Find y(x) by the Hilbert-Schmidt method. 





y(x) = 


In Exercise 21.2.11 it was found that the integral equation 
20 
g(x) = i f costs —t)g(t)dt 
0 


had (unnormalized) eigenfunctions cosx and sinx, both with eigenvalue A; = 1/z. 
Show that the kernel of this integral equation has an expansion of the form 


2 


K@p= =. 


n=1 
1 
The integral equation g(x) =A [a + xt)p(t) dt 
0 


has eigenvalues A; = 0.7889 and Az = 15.211. The corresponding eigenfunctions are 
gy, = 1+0.5352x and g2 = 1 — 1.8685x. 


(a) Show that these eigenfunctions are orthogonal over the interval [0, 1]. 


(b) Normalize the eigenfunctions to unity. 





21.4.8 
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(c) Show that 


K(x,t)= pix)ei) ih g2(x)palt) 


M 2 





ANS. (b) gi(x) = 0.7831 +0.4191x, 
(x) = 1.8403 — 3.4386x. 





An alternate form of the solution to the inhomogeneous integral equation, Eq. (21.81), is 
[o.@) 
Didi 
g(x) = d, ani 


(a) Derive this form without using Eq. (21.89). 
(b) Show that this form and Eq. (21.89) are equivalent. 


Additional Readings 


Bocher, M., An Introduction to the Study of Integral Equations, Cambridge Tracts in Mathematics and Mathe- 
matical Physics, No. 10. New York: Hafner (1960). This is a helpful introduction to integral equations. 


Byron, F. W., Jr., and R. W. Fuller, Mathematics of Classical and Quantum Physics. Reading, MA: Addison- 
Wesley (1969), reprinted, Dover (1992). The treatment of integral equations is rather advanced. 


Cochran, J. A., The Analysis of Linear Integral Equations. New York: McGraw-Hill (1972). This is a comprehen- 
sive treatment of linear integral equations intended for applied mathematicians and mathematical physicists. 
It assumes a moderate to high level of mathematical competence on the part of the reader. 

Courant, R., and D. Hilbert, Methods of Mathematical Physics, Vol. | (English edition). New York: Interscience 
(1953). This is one of the classic works of mathematical physics. Originally published in German in 1924, 
the revised English edition is an excellent reference for a rigorous treatment of integral equations, Green’s 
functions, and a wide variety of other topics on mathematical physics. 

Golberg, M. A., ed., Solution Methods of Integral Equations. New York: Plenum Press (1979). This is a set of 
papers from a conference on integral equations. The initial chapter is excellent for up-to-date orientation and 
a wealth of references. 

Kanval, R. P., Linear Integral Equations. New York: Academic Press (1971), reprinted, Birkhauser (1996). This 
book is a detailed but readable treatment of a variety of techniques for solving linear integral equations. 

Morse, P. M., and H. Feshbach, Methods of Theoretical Physics. New York: McGraw-Hill (1953). Detailed, 
rigorous, and difficult. 


Muskhelishvili, N. I., Singular Integral Equations, 2nd ed. New York: Dover (1992). 
Stakgold, I., Green’s Functions and Boundary Value Problems. New York: Wiley (1979). 


22.1 


CHAPTER 22 


CALCULUS OF VARIATIONS 


The calculus of variations deals with problems where we search for a function or curve, 
rather than a value of some variable, that makes a given quantity stationary, usually an 
energy or action integral. Because a function is varied, these problems are called varia- 
tional. Variational principles, such as those of D’Alembert, Lagrange, and Hamilton, have 
been developed in classical mechanics; Fermat’s principle (that of the shortest optical path) 
finds use in electrodynamics. Lagrangian variational techniques also occur in quantum me- 
chanics and field theory. Before plunging into this rather different branch of mathematical 
physics, let us summarize some of its uses in both physics and mathematics. 


1. In existing physical theories: 


a. Unification of diverse areas of physics using energy as a key concept 
b. Convenience in analysis: Lagrange equations, Section 22.2 
c. Elegant treatment of constraints, Section 22.4 


2. Starting point for new, complex areas of physics and engineering. In general rela- 
tivity, the geodesic is taken as the minimum path of a light pulse or the free-fall path 
of a particle in curved Riemannian space. Variational principles appear in quantum 
field theory. Variational principles have been applied extensively in control theory. 

3. Mathematical unification. Variational analysis provides a proof of the complete- 
ness of the Sturm-Liouville eigenfunctions, and can be used to establish bounds for 
the eigenvalues. Similar results follow for the eigenvalues and eigenfunctions in the 
Hilbert-Schmidt theory of integral equations. 


EULER EQUATION 


The calculus of variations typically involves problems in which a quantity to be minimized 
(or maximized) appears as a functional, meaning that it is a quantity whose argument(s) 
are themselves function(s), not just variable(s). As a simple, yet fairly general case, let J 


1081 
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be a functional of y, defined as 


x2 
d 
Jty] = [ #(ve. x2 x) dx. (22.1) 


xi 


Here f is a fixed function of the three variables y, dy/dx, and x, while J will have a value 
dependent on the choice of y. The square-bracket notation is frequently used to remind the 
reader that J is a functional. Because J is given as an integral, its value depends on the 
behavior of y(x) throughout the entire range of x (here x; <x < x2). A typical problem 
in the calculus of variations is to find (usually subject to some constraints) a continuous 
and differentiable function y(x) that makes J stationary relative to small changes in y 
anywhere (or everywhere) in its range of definition. These stationary values of J will in 
many problems be minima or maxima, but they can also be saddle points. The conditions 
of physical problems will normally require that variations in y be restricted to those that 
preserve its continuity and differentiability 

It is convenient to introduce a notation that makes our discussions less cumbersome; we 
usually rewrite Eq. (22.1) in a notation with dy/dx denoted y, and with the arguments 
x and [y] suppressed, and we indicate the variation in J produced by a (small) variation 
in y as 


x2 
sJ = 5 f Fo. Vx, x) dx. (029) 
xX] 


Note that we wrote 6 rather than d or 0; this distinction reminds us that the variation is that 
of a function (here y) rather than that of a variable. 

In visualizing the situation described by Eq. (22.2), it is helpful to think of y(x) as a path 
or curve connecting the values y(x,) and y(x2); in fact, acommon problem in the calculus 
of variations will be to determine y(x) subject to the constraint that y(x;) and y(x2) have 
specified values (and often subject to further constraints that may also be integrals). To 
illustrate the class of problems represented by Eq. (22.2), here are two simple examples: 


e Determination of the minimum-energy configuration of a rope or chain of given length 
attached to fixed points at both ends, in the presence of a uniform gravitational field. 


e Determination of the track between two points at different heights that will minimize 
the travel time of an object that, starting from rest, slides without friction along the track 
subject only to a uniform gravitational field (this is known as the brachistochrone 
problem). 


The problems here under consideration are much more difficult than typical minimiza- 
tions in differential calculus, where the minimum in a function can be found by comparing 
its values, say y(x), at neighboring points (by looking at dy/dx). What we can do, instead, 
is to start by assuming the existence of an optimum path, i.e., a function y(x) for which J 
is stationary, and then compare J for our (unknown) optimum path with that obtained from 
neighboring paths, of which there are an infinite number. See Fig. 22.1. Even this strategy 
may sometimes fail, as there exist functionals J for which there is no optimum path. 
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FiGURE 22.1 Neighboring paths. 


Restricting attention to functions y(x) for which the endpoints y(x;) and y(x2) are fixed, 
we consider a deformation of y(x), called the variation of y and denoted dy. We describe 
dy by introducing a new function, n(x), and a scale factor a that controls the magnitude of 
the variation. The function 7(x) is arbitrary except for being continuous and differentiable, 
and, to keep the endpoints fixed, with 


(x1) = (x2) = 0. (22.3) 
With these definitions, our path, now a function of a, is 
y(x, a) = y(x, 0) +an(x), (22.4) 


and we choose y(x, 0) as the (unknown) path that will minimize J. Relative to y(x, 0), the 
variation dy is then 


dy =an(x). (22.5) 
Using Eq. (22.4), our formula for J can now be written 
x2 


1a) =f #(yG.a).y0.a).2) dx, (22.6) 
x] 
and we see that we have reached a simpler formulation in which J is now a function of a 


rather than a functional of y. This means that we now know how to optimize it.! 
We proceed now to obtain a stationary value of J by imposing the condition 


| =0) (22.7) 
da a=0 





analogous to the vanishing of the derivative dy/dx in differential calculus. 


'The arbitrary nature of the dependence of J(@) on 7(x) will come into play later. 
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Now, the a@ dependence of the integral is contained in y(x,q@) and y,(x,@) = 
(0/dx)y(x, @). Therefore,” 


x2 

















aJ (a) / of dy Of dyx 
oa E da % Oy, Oa an ( ) 
xX] 
From Eq. (22.4), 
dy (x, a) _ eA. ane Dyx (Qt, @) _ pal (22.9) 
da da dx 
so Eq. (22.8) becomes 
x2 
oe) =| TF tg goth OUD Vos ap: (22.10) 
da dy dyy dx 


x1 
Integrating the second term by parts to get n(x) as a common factor, we convert it to 


x2 


(es af af 





dx = — 
dx dyx . Liss 





ee : da 
= foe dx. (22.11) 
md dx Oyx 


The integrated part vanishes by Eq. (22.3), and Eq. (22.10) becomes 


x2 


70. [| d of 
da dy dx dy 





n(x)dx =0. (22.12) 
x1 

Equation (22.12), which must be satisfied for arbitrary n(x), is to be understood as a con- 

dition on y(x). Occasionally we will see Eq. (22.12) multiplied by da, which gives, upon 

using (x)da = dy, 


x2 


a da 
sr= | oF ~) sy ax=0 (22.13) 
dy dx dyx 





Xx] 
Equation (22.13) is to be solved for arbitrary dy with dy(x1) = dy(x2) = 0. 

We now take up the solution of Eq. (22.12). That equation can be satisfied for arbitrary 
n(x) only if the bracketed expression forming the remainder of its integrand vanishes “al- 
most everywhere,” meaning everywhere except possibly at isolated points.> The condition 
for our stationary value is thus formally a partial differential equation (PDE), 

OF oY 2G (22.14) 
dy dx dyy 
known as the Euler equation. Since the form of f is known, it will actually reduce (be- 
cause there is really only one independent variable, x) to an ordinary differential equation 
(ODE) for y with boundary conditions at x; and x2. In that connection, it is important 


2Note that y and yx are being treated as independent variables because they occur as different arguments of /f. 
3 Compare the discussion of convergence in the mean, at Eq. (5.22). 
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to note that the derivative d/dx occurs in the Euler equation, and that it has a meaning 
distinct from the partial derivative 0/dx. In particular, if f = f(y(x), yx, x), then df/dx, 
which stands for the change in f (from all sources) due to a change in x, has the evaluation 


df _ af | afdy , af d7y 
dx dx  dydx dy, dx?’ 





where the last term has the form given because dy,/dx = d?y/dx7. Note that the first 
term on the right gives the explicit x-dependence of f; the second and third terms give its 
implicit x-dependence via y and yy. 

The Euler equation, Eq. (22.14), is a necessary, but by no means sufficient condition 
that there be a function y(x) that is continuous and differentiable on the range (x1, x2) and 
yields a stationary value of J.4 A nice example of a lack of sufficiency is provided by the 
problem of determining stationary paths between points on the surface of a sphere (this 
example was provided by Courant and Robbins; see Additional Readings). The minimum- 
distance path from point A to point B on a spherical surface is the arc of a great circle, 
shown as Path | in Fig. 22.2. But Path 2 also satisfies the Euler equation. Path 2 is a 
maximum, but only if we demand that it be a great circle and then only if we make less 
than one circuit (as Path 2 plus n complete revolutions is also a solution). If the path is 
not required to be a great circle, any deviation from Path 2 will increase the length. This 
is hardly the property of a local maximum, and that illustrates why it is important to check 
solutions of the Euler equation to see if they satisfy the physical conditions of the given 
problem. 

Sometimes a problem admits a discontinuous solution that has physical relevance and 
will not be found by straightforward application of the Euler equation. An example is 
provided by the soap film of Example 22.1.3, where such a solution describes what happens 
if the film becomes unstable and breaks. 

Following are examples of the use of the Euler equation. 





FIGURE 22.2 Stationary paths over a sphere. 





4For a discussion of sufficiency conditions and the development of the calculus of variations as a part of mathematics, see the 
works by Ewing and Sagan in Additional Readings. 
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Example 22.1.1 = STRAIGHT LINE 


Perhaps the simplest application of the Euler equation is in the determination of the shortest 
distance between two points in the Euclidean x y-plane. Since the element of distance is 


ds =[(dx)? + (dy)?]'? = [1 + y?]!”" dx, 


the distance J may be written as 


X2,2 x2 
Ts / ds= [t+ yids. (22.15) 
X15¥1 xX] 


Comparison with Eq. (22.2) shows that 


fO. yn = 1+ y2y¥, 


Substituting into Eq. (22.14) and noting that 0f/dy vanishes, we obtain 





d I = 
dxLU+y2Z!?2]° ’ 
or 


1 


—~—;~ =C,  aconstant. 
(1+ y2)!/2 


This equation is satisfied if 
yy =a, asecond constant. 
Integrating this expression for y,, we get 
y=ax +b, (22.16) 


which is the familiar equation for a straight line. The constants a and b are now chosen so 
that the line passes through the two points (x1, y1) and (x2, y2). Hence the Euler equation 
predicts that the shortest? distance between two fixed points in Euclidean space is a straight 
line. a 


The generalization of this to curved four-dimensional space-time leads to the impor- 
tant concept of the geodesic in general relativity. A further discussion of geodesics is in 
Section 22.2. 


Technically, we have only found a y(x) of stationary J. By inspection of the solution, we easily determine the distance to be a 
minimum. 
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Example 22.1.2 ~~ OPTICAL PATH NEAR A BLACK HOLE 


We now wish to determine the optical path in an atmosphere where the velocity of light 
increases in proportion to the height y according to v(y) = y/b, with b > 0 some parameter 
describing the light speed. So v = 0 at y = 0, which simulates the conditions at the surface 
of a black hole, called its event horizon, where the gravitational force is so strong that the 
velocity of light goes to zero, thus trapping light. 

Our variational principle (Fermat’s principle) is that light will take the path of shortest 
travel time from (x1, y1) to (x2, y2), namely 


X2,)2 X2,)2 X2,92 
d b Jax? + dy? 
ar= f ar= / cane / -ds=b / Vee cane, 247 
v y y 


X1,N1 X1,Y1 *X1,1 


The path is along a line defined by the relation between y and x. While we have in previous 
equations taken x to be the independent variable, there is no inherent requirement to do so, 
and our work on the present problem will be simplified if we choose y as the independent 
variable, and we write Eq. (22.17) in the form 


fo at (22.18) 


where x,y stands for dx /dy. Then our Euler equation will be 


244 
af d of ; xy + 
0. with nS 
dx dy dxy se Fry, y) y 


Noting that 0f/dx = 0 and differentiating 0f/dxy, we have 
d Xy 
dy y,/ ag +1 


This equation can be integrated, giving 


=0. 


Xy _ Ciy 


———— _ =C; =constant, or xy = ————. 
Vale! [1 — Cy? 


Writing x, = dx /dy and separating dx and dy in this first-order ODE, we find the integral 


Ciydy 


x y 
ale 
J1—C?y? 


Ci 


which yields 


2 2 1 
» or (x+C2)"+y"=—. 
Cy 


x+C2=- 
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FIGURE 22.3 Circular optical path in medium. 


Irrespective of the values of C, and C3, this light path is the arc of a circle whose center is 
on the line y = 0, namely the event horizon. The actual path of light passing from (x1, y1) 
to (x2, y2) will be on the circle through those points centered on y = 0; the construction 
of the path can be performed geometrically as shown in Fig. 22.3. Note that light will not 
escape completely from the black hole with this model for v(y) unless x; = x2 (a path 
perpendicular to the event horizon). 

This example may be adapted to a mirage (Fata Morgana) in a desert with hot air near 
the ground and cooler air aloft (the index of refraction changes with height in cool vs. hot 
air). For the mirage problem, the relevant velocity law is v(y) = vp — y/D. In that case, the 
circular light path is no longer convex with center on the x-axis, but becomes concave. Mi 


Alternate Forms of Euler Equations 


Another form of the Euler equation, which is often useful (Exercise 22.1.1), is 
of d af 
. =0. 22.19 
Ae. de (/ yx at) ( ) 


In problems in which f = f(y, yx), ie., in which x does not appear explicitly, 
Eq. (22.19) reduces to 








d of 
aa — yxy —— ] =0, 22.20 
dx (v a sx) ( ) 
or 
0 
f-yx f = constant. (22.21) 
OYx 


Example 22.1.3 Soap Film 


As our next illustrative example, consider two parallel coaxial wire circles to be connected 
by a surface of minimum area that is generated by revolving a curve y(x) about the x-axis. 
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FiGURE 22.4 Surface of rotation, soap-film problem. 


See Fig. 22.4. The curve is required to pass through fixed endpoints (x1, y;) and (x2, yz). 
The variational problem is to choose the curve y(x) so that the area of the resulting surface 
will be a minimum. A physical situation corresponding to this problem is that of a soap 
film suspended between the wire circles. 

For the element of area shown in Fig. 22.4, 


dA=2nyds=2nmyi+ a dx. 
The variational equation is then 


x2 


i= [ anya + yz)'/? dx. 
x1 
Neglecting the 27, we identify 
Fes 0) = + yg), 
Since 0f/dx = 0, we may apply Eq. (22.20) and get 


2 
21/2 _ Y Vx = 
y+ yx) G+ yp oh 
which simplifies to 


y es 


Squaring, we get 
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which rearranges to 


= dx Cc 
Oy) = a= ——— (22.23) 
y ye 


We note in passing that c; had better have a value that causes dy/dx to be real. Equa- 
tion (22.23) may be integrated to give 


a4.) 
xX =c , cosh Le pee, 
Cl 


and, solving for y, we have 





y =c1 cosh (: 7 =), (22.24) 
cl 


Finally, c; and cz are determined by requiring the solution to pass through the points 
(x1, y1) and (x2, y2). Our “minimum”-area surface is a special case of a catenary of revo- 
lution, or a catenoid. a 


Soap Film: Minimum Area 


This calculus of variations contains many pitfalls for the unwary. Remember, the Euler 
equation is a necessary condition, and assumes a differentiable solution. The sufficiency 
conditions are quite involved. Again, see the Additional Readings for details. Respect for 
some of these hazards may be developed by further considering the soap-film problem in 
Example 22.1.3, with (x1, y1) = (—x9, 1), (x2, y2) = (+0, 1). We are therefore consider- 
ing a soap film stretched between two rings of unit radius at x = +xo. The problem is to 
predict the curve y(x) assumed by the soap film. 

By referring to Eq. (22.24), we find that c2 = 0 because our problem is symmetric about 
x = 0. Then 





y =cy cosh (=). (22.25) 
cl 
and our endpoint conditions become 
X0 
cy cosh (=) =1. (22.26) 
cl 
If we take x9 = 5 we obtain the following transcendental equation for c1: 
1 
1 =c;cosh{ — }). (22.27) 
2c] 


We find that this equation has two solutions: cy = 0.2350, leading to a “deep” curve, and 
c, = 0.8483, leading to a “shallow” curve. Which curve is assumed by the soap film? 
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Before answering this question, consider the physical situation with the rings moved apart 
so that x9 = 1. Then Eq. (22.26) becomes 


1=c cosh (<). (22.28) 
cl 

which has no real solutions. The physical significance is that as the unit-radius rings were 
moved out from the origin, a point was reached at which the soap film could no longer 
maintain the same horizontal force over each vertical section. Stable equilibrium was no 
longer possible. The soap film broke (irreversible process) and formed a circular film over 
each ring (with a total area of 27 = 6.2832...). This is known as the Goldschmidt discon- 
tinuous solution to the soap-film problem. 

The next question is: How large may xo be and still give a real solution for Eq. (22.26)? 
Solving Eq. (22.26) for xo, 


xq = c, cosh7!(1/c1), (22.29) 


we find that xo will be real only for c; < 1 and that its maximum value is attained when 
dxo/dc, = 0. A plot of xo vs. cy is shown in Fig. 22.5; it helps to explain the behavior we 
observed at x9 = 5. We see from the plot (and more precisely from Exercise 22.1.6) that 
the Euler equation has no solutions for x9 > Xmax, Where Xmax * 0.6627, and that this x9 
value occurs when c, ¥ 0.5524. For values of x9 smaller than xmax, there are solutions for 
two different values of c,, corresponding to the “deep” and “shallow” curves found earlier 
for x9 = 5. 

Returning to the question as to which solution of Eq. (22.26) describes the soap film, 
let us calculate the area corresponding to each solution. Using Eq. (22.22) to reach the last 


Deep curve 














FiGURE 22.5 Solutions of Eq. (22.26) for unit-radius rings at x = +x. 
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member of the first line below, we have 


x0 x0 
4 
Asan f y+y)!Pax= = f ypax 
cl 
0 0 
xo 2 
x o|.. 2x0 2x0 
=4rc| cosh— } dx =zcj}sinh{ — ]+—|. (22.30) 
C1 C1 C1 
0 


For x9 = 7 Eq. (22.30) leads to 
c1} =0.2350 + A=6.8456, 
c) = 0.8483 —> A=5.9917, 


showing that the former can at most be only a local minimum. A more detailed investiga- 
tion (compare Bliss, Additional Readings, chapter IV) shows that this surface is not even 
a local minimum. For x9 = 5; the soap film will be described by the shallow curve 


y = 0.8483 cosh (Saas) , 


This shallow catenoid (catenary of revolution) will be an absolute minimum for 0 < xo < 
0.528. However, for 0.528 < x < 0.6627, its area is greater than that of the Goldschmidt 
discontinuous solution (6.2832) and it is only a relative minimum. See Fig. 22.6. 


Deep curve 





Goldschmidt 
discontinuous 
solution 


Shallow curve 











FIGURE 22.6 Catenoid area and that of the discontinuous solution of the soap-film 
problem (unit-radius rings at x = +x9). 
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For an excellent discussion of both the mathematical problems and experiments with 
soap films, we refer to Courant and Robbins in Additional Readings. The larger message 


of this subsection is the extent to which one must use caution in accepting solutions of the 
Euler equations. 


Exercises 


22.1.1 For dy/dx = y, £0, show the equivalence of the two forms of Euler’s equation: 





of d of =5 
ox dx dy, 
and 
af d of 
=0. 
dy dx (F men) 


22.1.2 Derive Euler’s equation by expanding the integrand of 


x2 


J(a@) = if f(»@,@), y(t, a), x) dx 


xX] 
in powers of a. 


Note. The stationary condition is 0J/(a)/da = 0, evaluated at a = 0. The terms 
quadratic in a may be useful in establishing the nature of the stationary solution (max1- 
mum, minimum, or saddle point). 


22.1.3 Find the Euler equation corresponding to Eq. (22.14) if f = f(xx, yx, y, xX), assuming 
that y and y, have fixed values at the endpoints of their interval of definition. 


da’ (a 
ANS. f ay ld ) + aL =0. 
dx? \ dy xx dx \ dyx dy 
22.1.4 The integrand f(y, yx, x) of Eq. (22.2) has the form 
fF (Ys Vx. X) = fi, y) + fo, y)yx- 





(a) Show that the Euler equation leads to 
0 a 
fi Of2 0 


dy ox 
(b) What does this imply for the dependence of the integral J on the choice of path? 
22.1.5 Show that the condition that J = / f (x, y) dx has a stationary value 


(a) leads to f(x, y) independent of y and 
(b) yields no information about any x-dependence. 


We get no (continuous, differentiable) solution. To be a meaningful variational problem, 
dependence on y or higher derivatives is essential. 
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22.1.6 


22.1.7 


22.1.8 


22.1.9 


22.1.10 


22.1.11 


Note. The situation will change when constraints are introduced (compare to Exer- 
cise 22.4.6). 





A soap film stretched between two rings of unit radius centered at +xo will have its 
closest approach to the x-axis at x = 0, with the distance from the axis given by c1, 
with x9 and cy related by Eq. (22.26) or Eq. (22.29). 


(a) Show that dc; /dxo becomes infinite when xo sinh(xo/c;) = 1, indicating that the 
soap film becomes unstable if xo is increased beyond the value satisfying this 
condition. 


(b) Show that the condition of part (a) is equivalent to 


x0 x 
—=coth = E 
Cl Cl 


(c) Solve the transcendental equation of part (b) to obtain the critical value of x9/c 
and show that the separate values of xo and c) are then approximately xo © 0.6627 
and cy ¥ 0.5524. 


A soap film is stretched across the space between two rings of unit radius centered 
at +xq on the x-axis and perpendicular to the x-axis. Using the solution developed in 
Example 22.1.3, set up the transcendental equations for the condition that xo is such that 
the area of the curved surface of rotation equals the area of the two rings (Goldschmidt 
discontinuous solution). Solve for xo. 


In Example 22.1.1, expand J[y(x, w)] — JLy(x, 0)] in powers of a. The term linear in 
a leads to the Euler equation and to the straight-line solution, Eq. (22.16). Investigate 
the a? term and show that the stationary value of J, the straight-line distance, is a 
minimum. 
(a) Show that the integral 
x2 
J = [ forndx, with f= y(x), 
x] 

has no extreme values. 
(b) If fO.yx,x) = y(x), find a discontinuous solution similar to the Goldschmidt 

solution for the soap-film problem. 
Fermat’s principle of optics states that a light ray in a medium for which n is the 
(position-dependent) index of refraction will follow the path y(x) for which 

*2,Y2 
n(y,x)ds 


X1,Y1 





is a minimum. For y2 = yj = 1, —x; = x2 = 1, find the ray path if 
(a)n=e”,  (b) n=a(y—yo), y> Yo. 


A particle moves, starting at rest, from point A on the surface of the Earth to point B 
(also on the surface) by sliding frictionlessly through a tunnel. Find the differential 





22.1.12 
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equation satisfied by the path if the transit time is to be a minimum. Assume the Earth 
to be a nonrotating sphere of uniform density. 

Hint. The potential energy of a particle of mass m a distance r < R from the center of 
the Earth, with R the Earth’s radius, is 5mg(R? —r*)/R, where g is the gravitational 
acceleration at the Earth’s surface. It is convenient to describe the path of the particle (in 
the plane through A, B, and the center of the Earth) by plane polar coordinates (r, 6), 
with A at (R, —g) and B at (R,¢@). 


ANS. Letting rp be the minimum value of r (reached at 6 = 0), 
r2 R2(r2 _ ra) 


Eq. (22.21) yields r? = ————- 
: Oe —r?) 


(the constant in 


that equation has the value such that rg = 0 at 0 = 0). 


The solution for the path is a hypocycloid, generated by a circle of radius 5(R — 10) 
rolling inside the circle of radius R. You might like to show that the transit time is 
(R?2 = as 

(Rg)!/? 
For details see P. W. Cooper, Am. J. Phys. 34: 68 (1966); G. Veneziano, et al., 34: 701 
(1966). 


A ray of light follows a straight-line path in a first homogeneous medium, is refracted 
at an interface, and then follows a new straight-line path in the second medium. See 
Fig. 22.7. Use Fermat’s principle of optics to derive Snell’s law of refraction: 


nN sin 6; =n2 sin 62. 


Hint. Keep the points (x;, y,) and (x2, y2) fixed and vary xo to satisfy Fermat’s 
principle. 


Note. This is not an Euler equation problem, because the light path is not differentiable 
at xo. 








FIGURE 22.7. Snell’s law. 
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22.1.13 


22.1.14 


22.1.15 





A second soap-film configuration for the unit-radius rings at x = +x consists of a 
circular disk, radius a, in the x = 0 plane and two catenoids of revolution, one joining 
the disk and each ring. One catenoid may be described by 


x 
y =c cosh ( + aa). 
Cc} 


(a) Impose boundary conditions at x = 0 and x = xo. 


(b) Although not necessary, it is convenient to require that the catenoids form an angle 
of 120° where they join the central disk. Express this third boundary condition in 
mathematical terms. 


(c) Show that the total area of catenoids plus central disk is then 


2 2 
A=ee sinh (= 43 2e3) +: “|. 
Cl cl 


Note. Although this soap-film configuration is physically realizable and stable, the area 
is larger than that of the simple catenoid for all ring separations for which both films 
exist. 


l=c, cosh (2 +03) dy 
ANS. (a) C1 (b) — = tan 30° = sinhc3. 
a=c,coshc3 . 


For the soap film described in Exercise 22.1.13, find (numerically) the maximum value 
of xo. 


Note. This calls for a calculator with hyperbolic functions or a table of hyperbolic cotan- 
gents. 


ANS. X0max = 0.4078. 


Find the curve of quickest descent from (0,0) to (xo, yo) for a particle that, starting 
from rest, slides under gravity and without friction. Show that the ratio of times taken 
by the particle along a straight line joining the two points compared to along the curve 
of quickest descent is (1 + 4/27)!/?. 


Hint. Take y to increase downwards. Apply Eq. (22.21) to obtain y2 = (1—c’y)/c*y, 
where c is an integration constant. It is helpful to make the substitution cy = sin? y/2 
and take (x9, yo) = (17/2c?, 1/c?). 


22.2 MORE GENERAL VARIATIONS 


Several Dependent Variables 


To apply variational methods to classical mechanics, we need to generalize the Euler equa- 


tion to situations in which there is more than one dependent variable in roles like y in 
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Eq. (22.2). The generalization corresponds to functionals J of the form 
x2 
J= i; f (wie, U2(X),...,Un(X), Uix(X), Wax (X), - +s Unx(), 2) dz. (22.31) 
x] 
We are now calling the dependent variables u; to be consistent with notations we will 
shortly introduce, and as before we use the subscript x to denote differentiation with respect 


to x, so that uj, = du;/dx and (later) nj, = dnj/dx. As in Section 22.1, we determine 
stationary values of J by comparing neighboring paths for each u;. Let 


uj(x, a) =uj(x, 0) +anj(x), ee) aes (22.32) 


with the 7; independent of one other but subject to the continuity and endpoint restrictions 
discussed in Section 22.1. By differentiating J from Eq. (22.31) with respect to a and 
setting a = 0 (the condition that J be stationary), we obtain 


[x (+ ni + eo | ns) dx =0. (22.33) 


Again, each of the terms (0f/0ujx)njix is integrated by parts. The integrated part vanishes 
and Eq. (22.33) becomes 


d a 
ibe (+ 2 <x) nidx =0. (22.34) 


Since the nj are arbitrary and independent of one another,° each of the terms in the sum 
must vanish independently. We have 


af d af 





=0, i=1,2,...,n, 22.35 
Ou; dx OUjx ; 7 ( ) 


a whole set of Euler equations, each of which must be satisfied for a stationary value of J. 


Hamilton’s Principle 


The most important application of Eq. (22.31) occurs when the integrand f is taken to 
be a Lagrangian L. The Langrangian (for nonrelativistic systems; see Exercise 22.2.5 for 
a relativistic particle) is defined as the difference of kinetic and potential energies of a 
system: 


L=T~-V. (22.36) 


Using time as an independent variable instead of x and x;(t) as the dependent variables, 
our conversion of Eq. (22.31) involves the replacements 


xt, yi > x(t), Vix > Xi(t); 


6For example, we could set 72 = 73 = n4:-: = 0, eliminating all but one term of the sum, and then treat n; exactly as in 


Section 22.1. 
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x;(t) is the position and x; = dx;/dt is the velocity of particle i as a function of time. 
The equation 6/J = 0 is then a mathematical statement of Hamilton’s principle of classical 
mechanics, 


12 
Bf Esra ayn 1505 oo t) dt =0. (22.37) 


ty 


In words, Hamilton’s principle asserts that the motion of the system from time f, to fy 
is such that the time integral of the Lagrangian L, or action, has a stationary value. The 
resulting Euler equations are usually called the Lagrangian equations of motion, 


—~-- 3—-=0  (eachi). (22.38) 


These Lagrangian equations can be derived from Newton’s equations of motion, and New- 
ton’s equations can be derived from Lagrange’s. The two sets of equations are equally 
“fundamental.” 

The Lagrangian formulation has advantages over the conventional Newtonian laws. 
Whereas Newton’s equations are vector equations, we see that Lagrange’s equations 
involve only scalar quantities. The coordinates x;,x2,... need not be a standard set of 
coordinates or lengths. They can be selected to match the conditions of the physical prob- 
lem. The Lagrange equations are invariant with respect to the choice of coordinate system. 
Newton’s equations (in component form) are not manifestly invariant. For example, Exer- 
cise 3.10.27 shows what happens when F = ma is resolved in spherical polar coordinates. 

Exploiting the concept of energy, we may easily extend the Lagrangian formulation from 
mechanics to diverse fields, such as electrical networks and acoustical systems. Extensions 
to electromagnetism appear in the exercises. The result is a unification of otherwise sep- 
arate areas of physics. In the development of new areas, the quantization of Lagrangian 
particle mechanics provided a model for the quantization of electromagnetic fields and led 
to the gauge theory of quantum electrodynamics. 

One of the most valuable advantages of Hamilton’s principle (the Lagrange equation 
formulation) is the ease in seeing a relation between a symmetry and a conservation law. As 
an example, let x; = gy, an azimuthal angle. If our Lagrangian is independent of ¢ (that is, 
g is said to be an ignorable coordinate), there are two consequences: (1) the conservation 
or invariance of the component of angular momentum associated with (conjugate to) ¢, 
and (2) from Eq. (22.38), dL /d¢ = constant. Similarly, invariance under translation leads 
to conservation of linear momentum. 


Example 22.2.1 — Movinc ParTICLE, CARTESIAN COORDINATES 
A particle of mass m moves in one dimension with its position described by a Cartesian 


coordinate x, subject to a potential V(x). Its kinetic energy is given by T = mx7/2, so its 
Lagrangian L has the form 


1 ., 
ss ed ee — V(x). 
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We will need 





OL ‘ aL dV (x) 

~=T =mMx, —_ => — 

Ox Ox dx 
We have identified the force F as the negative gradient of the potential. Inserting the results 
from Eq. (22.39) into the Lagrangian equation of motion, Eq. (22.38), we get 


= F(x). (22.39) 


< (mi) - F(x) =0, 


which is Newton’s second law of motion. |_| 


Example 22.2.2 Movinc PARTICLE, CIRCULAR CYLINDRICAL COORDINATES 


Now let us consider a particle of mass m moving in the xy-plane, that is, z = 0. We use 
cylindrical coordinates p, g. The kinetic energy is 


1... ; 1 : ; 
T= sn +y’)= 5p + p’¢), (22.40) 


and we take V = 0 for simplicity. 

We could have converted <7 + ¥* into circular cylindrical coordinates by taking 
x(p,~) = pcosg, y(p,¢~) = psing, and then differentiating with respect to time and 
squaring. What we actually did was to recognize that the cylindrical coordinates are an 
orthogonal system with scale factors hyp = 1, hy = p, so the velocity v has in the cylindri- 
cal system components vy = 6 and Uy = p@. 

We now apply the Lagrangian equations of motion first to the p coordinate and then 
to g: 


(np) p? =0 # (mp) =0 
ae ee = ee 


The second equation is a statement of conservation of angular momentum. The first may be 
interpreted as radial acceleration’ equated to centrifugal force. In this sense the centrifugal 
force is a real force. It is of some interest that this interpretation of centrifugal force as a 
real force is supported by the general theory of relativity. | 


Hamilton’s Equations 


Hamilton was the first to show that Euler’s equation for the Lagrangian enabled the equa- 
tions of motion to be reduced to the set of coupled first-order PDEs called Hamilton’s 
equations. A starting point for this analysis is the definition of the canonical momentum 
pi conjugate to the coordinate q;, defined as 


_ ab 


= —., (22.41) 
Ogi 


Pi 


7Here is a second method of attacking Exercise 3.10.13. 
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This definition is consistent with the elementary definition of momentum in Cartesian 
coordinates, where (in one dimension) T = mg?/2, p = mq. From Eq. (22.41) and the 
Lagrangian equations of motion, Eq. (22.38), we have by direct substitution 


OL 


ae. (22.42 
Ogi ) 


Pi 
and this permits us to write the variation of L in the form 


aL aL aL aL 
dL = — dq; + — dq )+ —dt= pn: dqi + pidgi) + —dt. (22.43 
> (= Tt a5 in) + yi Li Ge (22.43) 


i 
We now define the Hamiltonian as 


H= > pigi — L, (22.44) 
i 
and compute 


. : . . OL . . OL 
ae pid +qidpi)— (Sta + pidqi) + rat =D Gadpy — pidqi)— aoe 


(22.45) 
But from the chain rule for differentiation, we also have 
oH oH oH 
dH= — dp; + — dq; —dt. 22.46 
a ee ui) + at ( ) 


Equating the coefficients of dp;, dqi, and dt in Eqs. (22.45) and (22.46), we obtain Hamil- 
ton’s equations: 

0H. 0H, 0H OL 
=a ian at séE 








(22.47) 


In conservative systems, 0H /dt = 0, and H has a constant value equal to the total energy 
of the system. 


Several Independent Variables 


Sometimes the integrand f in an equation analogous to Eq. (22.2) will contain an unknown 
function, uw, that is a function of several independent variables, u = u(x, y, z). In the three- 
dimensional case, for example, that equation becomes 


F=f fff Oemssuy.ters 9.2) dx dy dz, (22.48) 


where uy = 0u/0x, Uy = Ou/dy, uz = du/dz, and u is assumed to have specified values 
on the boundary of the region of integration. 
Generalizing the analysis of Section 22.1, we represent the variation of u as 


u(x, y,z,a@) = u(x, y, Z,0) +an(x, y, Zz), 





22.2 More General Variations 1101 


where 7 is arbitrary except that it must vanish on the boundary. Our integral J is now, as 
in Section 22.1, a function of a, and our variational problem is to make J stationary with 
respect to a. 

Differentiating the integral Eq. (22.48) with respect to the parameter a and then setting 
a = 0, we obtain 


of of 
- , d = 0. 
al =f[f(Zn+ e+ Geng + gem) ds ydz=0 


We continue to use a notation similar to that used previously: 7, is shorthand for 
dn/dx, etc. 

Again, we integrate each of the terms (0f/du;) ; by parts. The integrated part vanishes 
at the boundary (because the deviation 7 is required to go to zero there) and we get 


dof 0 df a af 
II/ (z Ox Oux dy duy az s-) n(x y Z) xadyaz ( ) 


We must now digress to clarify the notation in Eq. (22.49). The derivative 0/dx enters 
that equation as a result of the integration by parts, and it therefore must act on all the x 
dependence of df/dux, not just on the explicit appearance of x in f. The reader may recall 
that this derivative was written d/dx when it arose in Section 22.1, but that notation is not 
entirely appropriate here as the functions involved also depend on y and z. 

We conclude our analysis with the now-familiar observation that since the variation 
n(x, y, Z) is arbitrary, the term in large parentheses is set equal to zero. This yields the 
Euler equation for (three) independent variables, 


of oa of a of 0 of 
du Ox OUx Oy OUy dz Ou; 


Remember that the derivative 0/dx operates on both the explicit and implicit x dependence 
of df/du,; similar remarks apply to 0/dy and 0/dz. 














=i, (22.50) 


Example 22.2.3 _ LAPLACE’s EQUATION 


A variational problem with several independent variables is provided by electrostatics. An 
electrostatic field has 


1 
energy density = see 
where E is the electric field. In terms of the static potential ¢, 
1 
energy density = 5e(Vy). 


Now let us impose the requirement that the electrostatic energy (associated with the field) 
in a given charge-free volume be a minimum subject to specific conditions on ¢ at the 
boundary. The assumption that the volume is charge-free makes g continuous and dif- 
ferentiable throughout the volume, and we therefore have a situation to which an Euler 
equation applies. We have the volume integral 


i= [[[ovoracayac= [ff i+ e+ @axayas, 
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where ¢, stands for dg/0x. Thus, 
FQ. Px. Py: Pes Vs D=OR+G AG, 
so Euler’s equation, Eq. (22.50), yields (with u in that equation replaced by ¢) 
—2(Pxx + Pyy + Pzz) = 9, 
which in the usual vector notation is equivalent to 
V(x, y,z)=0. 


This is Laplace’s equation of electrostatics. 
Closer investigation shows that this stationary value is indeed a minimum. Thus the 
demand that the field energy be minimized leads to Laplace’s PDE. a 


Several Dependent and Independent Variables 


In some cases our integrand f contains more than one dependent variable and more than 
one independent variable. Consider 


f= f (pe. Ys Z)s Pxs Pys Pzr Q(Xs Vs 2s Ges Vys Fes 0 (Xs Vs Za Fay Frys V eX Ys :) 
(22.51) 
We proceed as before with 


D(x, y, 2,0) = p(x, y, z,0) + a(x, y, z), 
q(x, y,Z,) =q(x, y,z,0) +an(x, y, Zz), 
r(x, y,z,@) =r(x, y,z,0)+af(x, y,z), and soon. 


Keeping in mind that €,7, and ¢ are independent of one another, as were the n; in 
Eq. (22.32), the same differentiation and then integration by parts will lead to 


of oa of 0 of 0 Of | 
dp 0x dpy dy Opy dz Opz 7 





0, (22.52) 


with similar equations for functions g and r. Replacing p,q,r,... with yj and x, y, z,... 
with x;, we can put Eq. (22.52) in a more compact form: 





a a a 
f ) ( F) =o |e eas ee (22.53) 
OY ; Ox; \ Oi; 
in which 
_ OY; 
Yij = ax; 


An application of Eq. (22.53) appears in Exercise 22.2.10. 
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Geodesics 


Particularly in general relativity, it is of interest to identify the shortest path between two 
points in a “curved space,” i.e., a space characterized by a metric tensor more general than 
that of Euclidean or even Minkoswki space. A path that is a “local minimum” (calculated 
using the relevant metric), meaning that it is shorter than other paths that can be reached 
from it by small deformations, is referred to as a geodesic. This definition causes both the 
two great-circle paths of Fig. 22.2 to be identified as geodesics, because even the longer 
path is of minimum length relative to small deformations. In practice, it is usually easy to 
identify which of several geodesics in fact corresponds to the shortest path. 

The calculus of variations is the natural tool for identifying geodesics, and in fact it was 
used in Example 22.1.1 to verify that a straight line is the geodesic connecting given points 
in Euclidean space. To extend the analysis to more general metric spaces, we start by relat- 
ing the distance between two neighboring points, ds, with the changes in their coordinates, 
dq' ,1=1,2,.... Note that we distinguish between covariant and contravariant quantities, 
using superscripts for the latter (coordinate displacements are contravariant; compare with 
Section 4.3). The distance ds is a scalar, given by 


ds? = gi; dq' dq’. (22.54) 


Here gj; is the metric tensor, which is symmetric but in many cases of interest not diagonal. 
Note that we are using the Einstein summation convention, so i and j in Eq. (22.54) are 
summed, causing ds? to be a scalar. This formula is an obvious generalization of that for 
Euclidean space, 


ds’ =dx* + dy” ae’, 


but differs therefrom in that the coordinates g; are not assumed to be mutually orthogonal, 
so ds? contains cross terms dg'dq/ with i £ j. 

A path in our curved space can be described parametrically by giving the q; as functions 
of an independent variable that we will call u, and the distance between two points A and 
B can then be represented as 


B B a B 
ye [Ba y sii! ae dqi dqi 
~ J du u=| du u=| Bij du du . 
A 


A A 


B 
= i V gigi qidu, (22.55) 
A 


where we are borrowing the dot notation, g! = dq'/du. 

One could now proceed to find the g (u) that minimize J, but this is a relatively difficult 
problem. Instead we rely on the Lagrangian formulation of relativistic mechanics, where, 
for a particle not subject to a potential (other than a gravitational force whose effect is 
described by the metric), the Lagrangian reduces to 





b= 84 4. (22.56) 
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Here the dot notation refers to derivatives with respect to the proper time t (or to any other 
variable related thereto by an affine transformation (meaning the new variable, e.g., u, is 
related to t by a transformation of the form u = at + b). This means that we can replace 
the minimization of J by that of the action: 


B 
6 f eid q/ du =0, (22.57) 
A 


in effect simplifying our problem by eliminating the radical that was present in Eq. (22.55). 

The minimization in Eq. (22.57) is a relatively simple standard problem in the calculus 
of variations; for solving it we note that each g;; is in general a function of all the qg* (but 
not the derivatives g*). There will be an Euler equation for each k; before simplification 
they take the form 


dgiqigi  d dgiqiql — 
agk du aq* 





0. (22.58) 


Starting to evaluate Eq. (22.58), we get 


I gig. ' _ eal re aq@i)=0. (22. 
age! 1 ~ ay Sti age 4 4 ) agk tf ~ Gy \8Ki4 + 8ikq (22.59) 








Some simplification is achieved by using the relations 








du = du aq' 


(remember that the Einstein summation convention is still in use). Equation (22.59) 
reduces to 





1.,.,;] Ogi; Ognj Ogi - 

gi qi | Se — SB SBR ging! =0. (22.60) 
2 oq oq' dq/ 
As a final simplification, we multiply Eq. (22.60) by g and use the identity gg, = i, 
reaching (in a more expanded notation) the geodesic equation 








q' , dq' dq ‘| 8kj | O8ik =| =f, (22.61) 


du du du 2 agi agi —s gk 

Comparing with the formula for the Christoffel symbol, Eq. (4.63), we can rewrite 
Eq. (22.61) as 

d*q' dqi dq! r 

du2— du du"! 








=0. (22.62) 


Note that although Eq. (22.62) gives the differential equation describing geodesics in 
curved space, it is a long way from that equation to its explicit solution for significant 
problems in general relativity. The exploration of such solutions is a topic of current re- 
search and beyond the scope of the present text. 
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Relation to Physics 


The calculus of variations as developed so far provides an elegant description of a wide 
variety of physical phenomena. The physics includes classical mechanics, as in Ex- 
amples 22.2.1 and 22.2.2; relativistic mechanics, Exercise 22.2.5; electrostatics, Exam- 
ple 22.2.3; and electromagnetic theory in Exercise 22.2.10. The convenience should not 
be minimized, but at the same time we should be aware that in these cases the calculus 
of variations has only provided an alternate description of what was already known. The 
situation does change with incomplete theories. 


Exercises 


22.2.1 


22.2.2 


22.2.3 


22.2.4 


If the basic physics is not yet known, a postulated variational principle can be a useful 
starting point. 


(a) Develop the equations of motion corresponding to L = 5m (x? +9). 


t 


(b) In what sense do your solutions minimize the integral / “L dt? 


. # ul 
Compare the result for your solution with x = constant, y = constant. 


From the Lagrangian equations of motion, Eq. (22.38), show that a system in stable 
equilibrium has a minimum potential energy. 


Write out the Lagrangian equations of motion of a particle in spherical coordinates for 
potential V equal to a constant. Identify the terms corresponding to (a) centrifugal force 
and (b) Coriolis force. 


The spherical pendulum consists of a mass on a wire of length /, free to move in polar 
angle @ and azimuth angle » (Fig. 22.8). 


(a) Set up the Lagrangian for this physical system. 


(b) Develop the Lagrangian equations of motion. 





FIGURE 22.8 Spherical pendulum. 
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22.2.5 


22.2.6 


22.2.7 


22.2.8 


Show that the Lagrangian 


2 
0) 
L=moc* | 1— Lae — V(r) 


leads to a relativistic form of Newton’s second law of motion, 


d MQvU; =F, 
dt\ f\—vzja2}) 
in which the force components are F; = —dV/0x;. 


The Lagrangian for a particle with charge g in an electromagnetic field described by 
scalar potential y and vector potential A is 


1 
L= smu —qg+qA-y. 


Find the equation of motion of the charged particle. 


Hint. (d/dt)A; = 0Aj;/dt+ )°;(0A;/0x;)x;. The dependence of the force fields E and 
B on the potentials g and A is developed in Section 3.9; see in particular Eq. (3.108). 


ANS. mx; = q{E+v x B];. 
Consider a system in which the Lagrangian is given by 
L(qi.4i) =T (Qi. 4i) — Vi), 


where q; and gq; represent sets of variables. The potential energy V is independent of 
velocity and neither T nor V has any explicit time dependence. 


(a) Show that 


(b) The constant quantity 


defines the Hamiltonian H. Show that under the preceding assumed conditions, H 
satisfies H = T + V, and is therefore the total energy. 
Note. The kinetic energy T is a quadratic function of the qj. 


The Lagrangian for a vibrating string (small-amplitude vibrations) is 


1 1 
=| (Sou? - st) dx, 


where p is the (constant) linear mass density and t is the (constant) tension. The 
x-integration is over the length of the string. Show that application of Hamilton’s 
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principle to the Lagrangian density (the integrand), now with two independent vari- 
ables, leads to the classical wave equation, 


a°u _ Pp a°u 
ax2 st Ot?" 


Show that the stationary value of the total energy of the electrostatic field of 
Example 22.2.3 is a minimum. 


Hint. Investigate the a terms of J. 


22.2.10 The Lagrangian (per unit volume) of an electromagnetic field with a charge density p 
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and current density J is given by 


1 2 lie 
L=-—{|e9E B petJ-A. 
2 Lo 





Show that Lagrange’s equations lead to two of Maxwell’s equations. (The remaining 
two are a consequence of the definition of E and B in terms of A and ¢.) 


Hint. Take g and the components of A as dependent variables; and x, y, z, and ¢ as 
independent variables. E and B are given in terms of A and g by Eq. (3.108). 


CONSTRAINED MINIMA/MAXIMA 


In preparation for dealing with problems in the calculus of variations in which an inte- 
gral is to be minimized subject to constraints (which may either be algebraic equations or 
fixed values of other integrals), we look now at situations in which we seek a constrained 
extremum of an ordinary function. 

A typical constrained problem of the type now under consideration is the minimization 
of a function of several variables, here illustrated as f (x, y, z), subject to the constraint that 
g(x, y, z) be kept constant. Since the equation g(x, y, z) = C defines a surface, our con- 
strained problem is that of minimizing f(x, y, z) on a surface of constant g. The presence 
of the constraint means that only two of the three variables x, y, z are actually independent, 
and in principle one could solve the constraint equation to obtain z as a function of x and 
y: z= z(x,y), after which one could obtain the desired minimum by setting to zero the 
derivatives 


~ ee y, (x, v)) and = F(x y,2Q, y)). 


However, it may be cumbersome, or in some cases nearly impossible to solve the constraint 
equation, and in any case this approach does not treat the variables x, y, z on an explicitly 
equivalent basis. For these reasons it is useful to employ an alternate procedure, known as 
the method of Lagrangian multipliers. 
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Lagrangian Multipliers 


Continuing with our three-dimensional illustration in which we seek to minimize f (x, y, z) 
subject to the constraint g(x, y, z) = C, our starting point is that the constraint equation 


implies 
3 3 3 
de=(—") ax+() aya (=) aa, 
dx dy). Oe Jey 


where (as indicated here explicitly) the partial derivatives of g are taken viewing x, y, and 
z as independent. Proceeding as for the derivation of Eq. (1.144), we have 








dy /, 


& ia) 
(3) a NPEGE aad (=) = — A289 az (22.63) 


eG) (z), 
2 Je Oe Jos 


Now setting (0f/0x), to zero, we have (imposing the constraint dg = 0) 


(G) 
eee eral a 
ax -_ ax a Oz 6 Ox — Ox ue (3) Ox ig 
xy 





Oz 


_ (of dg\ 
- (5). “= a. =; (22.64) 


i 
(=), 


i= a5 (22.65) 


(32) 
Oz . 


The quantity 2 is called a Lagrangian multiplier. 
Now taking Eq. (22.64), its equivalent with y replacing x, and a rearranged form of 
Eq. (22.65), we have the symmetrical set of formulas 


(32), 7? Se), =° 
Ox 98 Ox oe 


of dg\ _ 
(6), (8), co 


of dg\ 
(i), — (3). =e 


where 
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The generalization of Eqs. (22.66) to n variables and k constraints is 





k 
i; ies 
Lis Si 0, §=1,2,...,0. (22.67) 


The n equations, Eqs. (22.67), contain n + k unknowns (the n x; and the k ;), and they 
are to be solved subject also to the k constraint equations. In some problems it is never 
necessary to evaluate explicitly the Lagrangian multipliers, and for this reason the method 
is sometimes referred to as that of (Lagrange’s) undetermined multiplier(s). 

Note that the formulation provided above does not only identify minima; the same equa- 
tions will locate maxima and saddle points. It is necessary to determine the nature of the 
stationary points from the specific problem at hand. 

While the derivation of Eq. (22.66) was asymmetric in that 4 was obtained considering 
z to be a dependent variable, we could have carried out the analysis with x or y in place 
of z. This gives us an alternate route to the final formulas in the special case that (dg/0z) 
vanishes, in which case Eq. (22.65) becomes undefined. The method only fails if all the 
derivatives of a constraint function vanish at the stationary point. 


Example 22.3.1 > ~~ MINIMIZING SURFACE-TO-VOLUME RATIO 


Consider a right circular cylinder of radius r and height h. We wish to find the ratio h/r 
that will minimize the surface area for a fixed enclosed volume. The relevant formulas are: 
surface area S = 27 (rh + r2), volume V = 22h. 

Applying Eqs. (22.67) for the case of one constraint and two independent variables, 





we have 
as aV 
— —A— =2n(h+ 2r) —AQznrh) =0, 
or or 
as V 
re =2nr—Anr? =0. 
oh oh 


Eliminating 4 from these equations, we find h/r = 2. Because we have not also used 
the constraint equation, we get only the ratio of the two variables h and r (which is the 
information that is relevant for the present problem). However, if we specify the volume 
V (ie., use the constraint equation), we then get individual values of h andr. 

We close with two more observations: (1) Our solution obviously provides a minimum 
S/V ratio, but in principle this has to be determined by closer study of the problem. In 
the present case, there is no maximum, as S/V increases without limit as h/r approaches 
zero. (2) We note that minimizing S for fixed V is the same thing as maximizing V for 
fixed S, and leads to equivalent Lagrangian multiplier equations. a 
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Exercises 


22.3.0 
22.3.1 


22.3.2 


22.3.3 


22.3.4 


22.3.5 


22.3.6 


22.3.7 


The following problems are to be solved by using Lagrangian multipliers. 


The ground-state energy of a quantum particle of mass m in a pillbox (right-circular 
cylinder) is given by 





_ Wh? ( (2.4048)? F nm 
~ 2m R2 H?)’ 


in which R is the radius and H is the height of the pillbox. Find the ratio of R to H that 
will minimize the energy for a fixed volume. 


The U.S. Post Office limits first-class mail to Canada to a total of 36 inches, length plus 
girth. Using Lagrange multipliers, find the dimensions of the rectangular parallelepiped 
of maximum volume subject to this constraint. 


A thermal nuclear reactor is subject to the constraint 


2 2 2 
ota, b,c) = (=) +(F) +(=) = B’, a constant, 





where the reactor is a rectangular parallelepiped of sides a, b, and c. Find the ratios of 
a, b, and c that maximize the reactor volume. 


ANS. a=b=c, cube. 


For a lens of focal length f, the object distance p and the image distance gq are related 
by 1/p + 1/q = 1/f. Find the minimum object-image distance (p + q) for fixed f. 
Assume real object and image (p and g both positive). 


You have an ellipse (x/a)* + (y/b)? = 1. Find the inscribed rectangle of maximum 
area. Show that the ratio of the area of the maximum-area rectangle to the area of the 
ellipse is 2/7 = 0.6366. 


A rectangular parallelepiped is inscribed in an ellipsoid of semiaxes a, b, and c. Maxi- 
mize the volume of the inscribed rectangular parallelepiped. Show that the ratio of the 
maximum volume to the volume of the ellipsoid is 2/2 3 ¥ 0.367. 


Find the maximum value of the directional derivative of g(x, y, z), 


d 0 0 0 
Fs 7 Bg OAT Gy OB HG COSY, 


subject to the constraint, 


cos” a + cos” B + cos? y =1. 
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VARIATION WITH CONSTRAINTS 


As in earlier sections, we seek the path that will make the integral 


0 F 
r= f1(v. ay ay (22.68) 
Ox; . 


stationary. This is the general case in which x; represents a set of independent variables 
and y; a set of dependent variables. Now, however, we introduce one or more constraints. 
This means that the y; are no longer independent of each other. Then, if we vary the y; 
by writing y;(@) = y; (0) + a7;, not all the 7; may then be varied arbitrarily, and the Euler 
equations would not apply. 

Our approach will be to use Lagrange’s method of undetermined multipliers. We con- 
sider first the possibility that the kth constraint takes the form of an equation: 


OY: 
ox (i 3x") =0. (22.69) 


This will ordinarily not be meaningful unless there is more than one dependent or indepen- 
dent variable, so that Eq. (22.69) restricts, but does not fully determine y;. Remember that 
yj and x; are here used to denote sets of variables. To introduce an undetermined multi- 
plier and remain in harmony with our study of the calculus of variations, we note that the 
constraint, Eq. (22.69), can be stated in the form 


Yi 
[rcpor(si. aay 4 dx =0, (22.70) 
with A,(x;) an arbitrary function of the x;. Equation (22.70) is clearly satisfied if 
Oy; 
8 [capron(s 5s) =0. (22.71) 


Alternatively, we may have a constraint in the form of an integral (now dependent on both 
the y; and their derivatives throughout the interval on which the problem is defined): 


0 
Jo (>. i ny) = constant. (22.72) 
Xj 


The effect of this constraint can be brought to a form consistent with Eq. (22.71) by writing 


0 : 
a fr Pk (». a sj) =). (22.73) 
Ox; 


Note that in this equation Ax; does not depend on the x; but is simply a constant, as it is 
only the integral of gy, that is required to be stationary. 

At this point, our constraints have been written as integrals that are dependent on the 
undetermined multipliers 4,, where 4, means either A,(x;) or just A,, depending on 
whether the constraint was from Eq. (22.71) or (22.73). We therefore have our problem 
in a form suitable for applying the method of Lagrangian multipliers as developed in 
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Section 22.3, and may use a formula analogous to Eq. (22.67). In our present notation, 


we obtain 
OY: OY: 
Sf | fl yi. 147) + Do Anwe (97, 3 ) fay =O. (22.74) 
Ox; : dx; 


Remember that the Lagrangian multiplier 4, may depend on the x; when ¢(),, x;) is given 
in the form of Eq. (22.69). 

We now continue by treating the entire integrand as a new function whose integral is to 
be made stationary: 


0 5 
g (» on ) =ft > rcgr. (22.75) 
J k 


If we have N dependent variables y; (i = 1, 2,..., N) and m constraints (k = 1, 2,...,m), 
then N — m of the n; may be taken as arbitrary. In place of arbitrary variation of the m 
remaining 7;, we may instead set the m multipliers 4, to the (presently unknown) values 
that permit the Euler equations to be satisfied. The overall result is that we may require 
satisfaction of an Euler equation for each of the dependent variables y;, but the m quantities 
Ax that appear in the solution of the Euler equations must be assigned values consistent 
with the constraints that have been imposed. In other words, it will be necessary to solve 
simultaneously the Euler equations and the equations of constraint to find the function g 
(and hence f) yielding a stationary value. 


Lagrangian Formulation with Constraints 


In the absence of constraints, Lagrange’s equations of motion Eq. (17.52) were found to be® 


d ok OL 
=0 


dt agi dqi 
with f (time) the one independent variable and q;(t) (the particle positions) a set of depen- 
dent variables. Usually the generalized coordinates g; are chosen to eliminate the forces 
of constraint, but this is not necessary and not always desirable. In the presence of holo- 
nomic constraints (those that can be expressed via mathematical expressions, e.g., g, = 0), 
Hamilton’s principle is 


| Ladin + Lawatan |ar=0 (22.76) 
k 
and the constrained Lagrangian equations of motion are 
d ol aL 
oS ikAk- 22.77 
i eg ne 


8The symbol q is customary in classical mechanics. It serves to emphasize that the variable is not necessarily a Cartesian variable 
(and not necessarily a length). 
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Usually the constraint is of the form g, = @,(q;, tf), independent of the generalized veloci- 
ties q;. In this case the coefficient a;x is given by 
IDK 
qi 
Then ajxAx% (no summation) represents the force of the kth constraint in the ¢;-direction, 
appearing in Eq. (22.77) in exactly the same way as —0V/0qj;. 





air = (22.78) 


Example 22.4.7 = Simple PENDULUM 


To illustrate, consider the simple pendulum, a mass m, constrained by a wire of length 
J to swing in an arc (Fig. 22.9) under a gravitational force characterized by a constant 
acceleration g. In the absence of the one constraint, 


yg, =r—1=0, (22.79) 


there are two generalized coordinates r and 6 (assuming the motion to be restricted to a 
vertical plane). The Lagrangian is 


1 . 
L=aT=Ve= si +r°6*) + mgr cos6, (22.80) 
taking the potential V to be zero when the pendulum is horizontal, at 9 = 2/2. Noting that 
0¢1 OPI 
— 1, — 4 0, 
ory: 01 96 
the equations of motion obtained from Eq. (22.77) are 
doL aL d oL ob 
ee ey, ee, 22.81 
dt a? ar dta6d a0 oe 


or 





d : 
Ai (mr) — mr6? mg cosé =A, 


d . 
am + mgr sin@ = 0. 





FIGURE 22.9 Simple pendulum. 
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Substituting from the equation of constraint (r =/, 7 = 0), these equations become 
ml6* + mg cos@ = —Aq, ml°6 +mgl sin = 0. (22.82) 


The second equation may be solved for @(t) to yield simple harmonic motion if the am- 
plitude is small (sin@ ¥ 6), whereas the first equation expresses the tension in the wire in 
terms of 6 and 6. Note that since the equation of constraint, Eq. (22.79), is in the form of 
Eq. (22.69), the Lagrange multiplier 4; will be a function of t. Since the second equation 
suffices to determine 9(t) (assuming a choice of initial conditions), the left-hand side of 
the first equation can be evaluated if an explicit form for A is desired. a 


Example 22.4.2 — SuDING OFFALOG 


Another example from mechanics is the problem of a particle sliding on a cylindrical sur- 
face, as shown in Fig. 22.10. The object is to find the critical angle 0, at which the particle 
flies off from the surface. This critical angle is the angle at which the radial force of con- 
straint goes to zero, and it will depend on the initial velocity with which the particle departs 
from a position atop the cylinder. To make the problem well-defined, we seek the maxi- 
mum value that can be attained by 6,, corresponding to its limit at low initial velocity. 

To illustrate the present constrained-minimization method, we take 


L=T-V= 5m (Fr? +1767) — mgr cos@ (22.83) 


and the one equation of constraint, 


gi=r—l=0. (22.84) 
Proceeding as in Example 22.4.1, with 
ag1 d91 
4S 1, SS 0, 
On or 01 86 


we reach 


m¥ — mré~ +mgcos@ =A (6), 


mr?6 + 2mr76 — mgr sind = 0. 








FiGuRE 22.10 A particle sliding on a cylindrical surface. 
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We have chosen to identify the constraining force A, as a function of the angle 6, a valid 
choice since @ is a single-valued function of the independent variable f. 
Inserting the constrained values r = 1, F = r = 0, these equations reduce to 
—ml6* + mg cos@ = i1(6), (22.85) 
ml°6 — mgl sind = 0. (22.86) 
Differentiating Eq. (22.85) with respect to time and remembering that 


df@) _ af) 


’ 





dt dé 
we obtain 
3 ; dai (0) 
— 2ml0 — mg sind = : (22.87) 
dé 
Combining Eqs. (22.86) and (22.87) to eliminate the 6 term, we have 
di _3 ind 
qe m8 sin 0, 
which integrates to 
A1(0) = 3mgcosé+C. (22.88) 


To fix the constant C, we evaluate Eq. (22.88) for 9 = 0: 


—ml6? 1 ms = 3mg tC, 


which shows that C < —2mg, with C = —2mg when the initial velocity 6 (0) is zero. Using 
this value of C (which leads to the largest critical angle), we have 


A1(0) =mg(3cosé — 2). (22.89) 


The particle will stay on the surface as long as the force of constraint is nonnegative, that 
is, as long as the surface has to push outward on the particle, corresponding to 4; (@) > 0. 
From Eq. (22.89) we find that the critical angle, at which 4; (6,) = 0, satisfies 


2 
cos 0, = 37 OF 6. = 48°11’ 


from the vertical. At or before this angle (neglecting all friction) our particle takes off. 

It must be admitted that this result can be obtained more easily by considering a varying 
centripetal force furnished by the radial component of the gravitational force. The example 
was chosen to illustrate the use of Lagrange’s undetermined multiplier without confusing 
the reader with a complicated physical system. | 
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Example 22.4.3 > — SCHRODINGER WAVE EQUATION 


As a final illustration of a constrained minimum, let us find the Euler equations for the 
quantum mechanical problem of a particle of mass m subject to a potential V, 


éJ = / w* (r) Hw (n)d?r, (22.90) 


with the constraint that w is the normalized wave function of a bound state: 


[round =a (22.91) 


Equation (22.90) is a statement that the energy of the system is stationary, with H its 
Hamiltonian operator 


hz 
H=-—V*4+V(n. (22.92) 
2m 


In Eq. (22.90) yw and y* are dependent variables; since they are in principle complex we 
can treat each as a separate variable; this point was discussed in Chapter 5, footnote 3. 

The integrand in Eq. (17.121) involves second derivatives, but it is convenient to convert 
them to first derivatives using Green’s theorem, Eq. (3.86): 


[rover [wv do — [ ut Vuer 
S 


We now observe that the surface terms vanish due to the requirement that yw be continuous, 
and our variational principle becomes 


2 
sf Fier ws very | d’r =0. (22.93) 
2m 


The function g for our constrained variation is therefore 


hi 
g= care (vee Vy —Ayty 


m 
hz 
= Om ee x + yy + Wil) + Vy — Any, (22.94) 


m 
again using the subscript x to denote d/dx. For yj = w*, our Euler equation becomes 
0g 0 0g 0d 0g 0 0g 


ay* dx aWe dy Ws dz AWE 





This yields 


hz 
Vy =: Aw = am UR oe Wyy oF Wezz) = 0, 


or 


2 
py a vy aay. (22.95) 
2m 





22.4 Variation with Constraints 1117 


The Euler equation for y; = w gives the complex conjugate of Eq. (22.95), and therefore 
provides no further information. Reference to Eq. (22.92) enables us to identify 4 physi- 
cally as the energy of the quantum mechanical system. With this interpretation, Eq. (22.95) 
is the celebrated Schrédinger wave equation. a 


Rayleigh-Ritz Technique 
A number of physically important problems can be related to variational principles of the 
general form 
b 
i75 / (pay? if g(x)y)ax =: (22.96) 
a 
where y(a) and y(b) have fixed values, and the variation is subject to the constraint 
b 
; y*w(x)dx = constant. (22.97) 
a 


Treating Eqs. (22.96) and (22.97) as a constrained minimization, its Euler equation takes 
the form 


“ (pw *) —q(x)y +awy =0, (22.98) 
x dx 


where A is a Lagrange multiplier. This situation usually arises in contexts such that w(x) 
is a nonnegative weight function and y(a) and y(b) satisfy Sturm-Liouville boundary con- 
ditions, meaning that 


b 
a 





P(x)yxy| =0. (22.99) 
From the above we conclude that although originally introduced as a Lagrange multiplier, 
A must also be an eigenvalue of the Sturm-Liouville system described by Eqs. (22.98) and 
(22.99). This identification was already noted in Example 22.4.3. 

Often problems of the type now under discussion are presented as unconstrained mini- 
mizations of the form 





b 
‘ (poy? + g(x)y?)ax 
6J=5] * ; =0. (22.100) 
[rwcoas 


a 


Equation (22.100) is equivalent to the earlier formulation because py? + gy* is homoge- 
neous in y and the denominator normalizes y without otherwise changing its functional 
form. The J satisfying Eq. (22.100) evaluates to the eigenvalue 2. 
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In the frequently occurring case that p(x) is actually independent of x, we can manipu- 
late the y2 term in the integrand of J, causing Eqs. (22.96), (22.97), and (22.100) to assume 
the useful forms, 


b 
bJ= | (— Pyxx + q(x)y?)dx =0, (constrained minimum), (22.101) 
a 


ps 





po5 — aay + Awy = 0, (22.102) 
b 
/ (— py yxx +q(x)y?)dx 
6J=5| * ‘ =0, (unconstrained, J =A). — (22.103) 
[Pwooas 


a 


The Rayleigh-Ritz technique uses the direct evaluation of any one of the above forms for 
5J =0 as a means of obtaining solutions to the eigenvalue problem shown as Eq. (22.98) 
or (22.102). Application of the technique can be as simple as guessing a form for y and 
evaluating J, but more accurate results are obtained by taking a form for y(x) that contains 
adjustable parameters, and then varying the parameters to minimize J within the param- 
eter space. The quality of the results obtained obviously depends on whether the actual 
minimum form for y has been well approximated. 


Ground State Eigenfunction 


Suppose that we seek to compute the ground-state eigenfunction yo and eigenvalue Ag 
of some complicated atomic or nuclear system.’ A classic example, for which no exact 
analytical solution has been found, is the helium atom problem. The eigenfunction yo is 
unknown, but we shall assume that we can make a pretty good guess at an approximation 
to it, which we will call y. Although we do not know either yo or any other eigenfunctions 
y;  =1, 2...), or the corresponding eigenvalues 4;, we do know, because the eigenfunc- 
tions can be chosen to form a complete orthogonal set, that we can write the expansion 


[o,@) 
y=coyot > ciyi- (22.104) 

i=1 
We shall assume that we picked y sensibly enough that it is not orthogonal to the ground 
state, so co # 0. Invoking the orthogonality property, Ey, the expectation value of the 


°This means that Ag is the smallest eigenvalue. 
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energy for wave function y, is 
CO 
2 
> leil7ae 


H = 
By 2 (22.105) 


(yly) = 
iar 
i=0 


where H, the operator defining the Schrédinger equation, typically has the form 





ne ae 
ie 2m dx? a 
The Schrédinger equation and its approximate solution E, are then seen to correspond to 
Eqs. (22.102) and (22.103). The final member of Eq. (22.105) results from the substitution 
of Eq. (22.104). This substitution is similar to that carried out in Eq. (6.30), but note that 
in that equation the function (there called y) was assumed normalized. As we already 
observed in Section 6.4, the expression for Ey is a weighted average of the eigenvalues 
(with all the weights |c;|? > 0), so Ey must be at least as large as yo, and in fact must be 
larger if y contains any admixture of eigenfunctions whose A; are larger than Ao. 
It is useful to scale y so co = | and rearrange Eq. (22.105) to 


CO 
2 
Sah 
i=1 


Ey =49 + — >: (22.106) 


1+ ye 
i=1 


a form that makes clear that the error in Ey will be quadratic in the c;, even though the 
difference between y and yg is linear in the c;. 
Our analysis therefore contains two important results. 


(1) Whereas the error in the eigenfunction y was O(c;), the error in 2 is only Ow): 
Even a poor approximation of the eigenfunctions may yield an accurate calcula- 
tion of the eigenvalue. 

(2) If\o is the lowest eigenvalue (ground state), then Ey > Ao, so our approximation 
is always on the high side, but converges to 49 as our approximate eigenfunction 
y improves (c; — 0). 


In practical problems in quantum mechanics, y often depends on parameters that may be 
varied to minimize Ey and thereby improve the estimate of the ground-state energy Ao. 
This is the “variational method” discussed in quantum mechanics texts. It was illustrated 
in Example 8.4.1. 


Example 22.4.4 — Quantum OscILLATOR 


The ground state of a quantum-mechanical particle of mass m constrained to the region 
0 <x < oo and subject also to a potential V = kx*/2 is described (in a unit system with 
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h = 1) by the lowest-eigenvalue eigenstate of the Schrédinger equation, 


1 dy ee yer (22.107) 
Sm dye 9 , 


subject to the boundary conditions (0) = (co) = 0. A guessed wave function consis- 
tent with the boundary conditions is y(x) = xe’. Let’s find the value of a making the 
approximate eigenvalue a minimum. 

Our Schrédinger equation is of the type represented by Eq. (22.102), so we can use 
Eq. (22.103) and find the unconstrained value of J as given there with p = 1/2m, q = 











kx?/2, w = 1, and integration range (0, oo). Noting that y,, = a(ax — 2)e~**, we have 
CO 
k 4 
/ ( = (ax —2)+ . Je ax 1 3k 
se 7 8ma t Bas Bk 
0 8ma__ 8a a 
= = = . 22.108 
J 00 Jt 2m _ 2a2 ( ) 
[eetas 4a3 
0 


Differentiating Eq. (22.108) with respect to a” and setting the result to zero, we get 





1 3k 1/4 
a or a=(3mk)". 
Inserting this a value into the expression for J, Eq. (22.108), we find 
(3mk)!/2 3k [3k [k 
J= = ~ 1.732,/ —. 22.109 
2m a 2(3mk)1/2 m m ( ) 
This value of J is an upper bound to the ground-state energy, the exact value of which is 


1.5,/k/m. 

Taking a somewhat more complicated (and flexible) wave function, of the form y = (x + 
cx?)e~**, and optimizing both a and c, the approximate energy improves to 1.542,/k/m. 
The approximate wave functions of this example are compared with the exact wave func- 
tion in Fig. 22.11. Note that the second approximation yields an eigenvalue that is in error 





Approx 











4 





FIGURE 22.11 Exact and approximate ground-state wave functions for quantum 
oscillator, Example 22.4.4, plotted for k/m = 1. Left: Single-term approximation 
y =xe7**, Right: Two-term approximation y = (x + cx*)e7**. 
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by less than 3%, even though the approximate wave function exhibits considerably larger 
relative errors. a 


Example 22.4.5 VARIATION OF LINEAR PARAMETERS 


A frequent use of the Rayleigh-Ritz technique is the approximation of an eigenfunction of 
a Schrédinger equation, 


Ay (x) = Ey(x), 


as a truncated expansion in a fixed orthonormal set of functions. The advantage of this pro- 
cedure is that the parameters in the wave function all occur linearly, and the optimization 
reduces to a matrix eigenvalue problem. 

Given an approximate function (often called a trial function) of the form 


N 
y(x) =) cigi(x), (22.110) 


i=l 
we seek to minimize 


J=(y|H|y) subjectto (yl/y)=1. (22.111) 


Again we emphasize that the g; have no specific relation to the eigenfunction we seek; 
they are simply members of an orthonormal set that has two desirable features: (1) they are 
such that a few of them can provide a reasonable representation of the eigenfunction, and 
(2) they are tractable in the sense that it is convenient to evaluate the matrix elements we 
are about to define. 

Defining a matrix H of elements Hj; = (g;|H|g;) and a column vector ¢ with compo- 
nents cj, we can restate Eq. (22.111) as the minimization of 


J=c'He subjectto e’c=1. (22.112) 


This formulation, in turn, can be reduced using Lagrangian multipliers to the unconstrained 
matrix eigenvalue problem 


He=Ae, (22.113) 


which we can solve (using matrix methods) for 4 (the approximate value of J). This ap- 
plication of the Rayleigh-Ritz technique is therefore seen to be equivalent to the approxi- 
mation of an operator equation by a finite matrix equation. a 


Exercises 


22.4.1 


A particle of mass m is on a frictionless horizontal surface. In terms of plane polar 
coordinates (r, 0), it is constrained to move so that 6 = wt (accomplished by pushing 
it with a rotating radial arm against which it can slide frictionlessly). With the initial 
conditions 





t=0, r=ro, r=0: 
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22.4.2 


22.4.3 


22.4.4 


22.4.5 


22.4.6 


(a) Find the radial position as a function of time. 
ANS. r(t) =rocoshat. 
(b) Find the force exerted on the particle by the constraint. 
ANS. F© =2mio = 2mroo” sinhat. 


A point mass m is moving on a flat, horizontal, frictionless plane. The mass is con- 
strained by a string to move radially inward at a constant rate. Using plane polar coor- 
dinates (7,0), r=ro — kt: 

(a) Set up the Lagrangian. 

(b) Obtain the constrained Lagrange equations. 


(c) Solve the 6-dependent Lagrange equation to obtain w(t), the angular velocity. 
What is the physical significance of the constant of integration that you get from 
your “free” integration? 


(d) Using the w(t) from part (b), solve the r-dependent (constrained) Lagrange equa- 
tion to obtain A(t). In other words, explain what is happening to the force of 
constraint as r > 0. 


A flexible cable is suspended from two fixed points. The length of the cable is fixed. 
Find the curve that will minimize the total gravitational potential energy of the cable. 
ANS. Hyperbolic cosine. 


A fixed volume of water is rotating in a cylinder with constant angular velocity w. Find 
the curve of the water surface that will minimize the total potential energy of the water 
in the combined gravitational-centrifugal force field. 


ANS.  Parabola. 


(a) Show that for a fixed-length perimeter the plane figure with maximum area is a 
circle. 


(b) Show that for a fixed planar area the boundary with minimum perimeter is a circle. 
Hint. The radius of curvature R is given by 
(2+ rayne 2 
7 rreg — 2r7 — 2 


Note. The problems of this section, variation subject to constraints, are often called 
isoperimetric. The term arose from problems of maximizing area subject to a fixed 
perimeter, as in part (a) of this problem. 


Show that requiring J, given by 


bb 
r= f [ Ke.noomaran, 





22.4.7 


22.4.8 


22.4.9 
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to have a stationary value subject to the normalizing condition 


b 


i gy’ (x)dx =1 


a 
leads to a Hilbert-Schmidt integral equation of the form given in Eq. (20.64). 
Note. The kernel K (x, f) is symmetric. 


An unknown function satisfies the differential equation 
n,(% , 
ae =0 
yW+(S)y 
and the boundary conditions y(0) =1, y(1) =0. 
(a) Calculate the approximation 4 = F[ytriat] for ytrial = 1 — x7. 
(b) Compare with the exact eigenvalue. 
ANS. (a) =2.5.  (b) A/Aexact = 1.013. 

In Exercise 22.4.7 use a trial function y = 1 — x”. 
(a) Find the value of n that will minimize F[yta1]. 
(b) Show that the optimum value of n drives the ratio 4./Aexact down to 1.003. 


ANS. (a) n= 1.7247. 


A quantum mechanical particle in a sphere (Example 14.7.1) satisfies 
Vy +h =0, 


with k* = 2mE/h?. The boundary condition is that y = 0 at r =a, where a is the 
radius of the sphere. For the ground state [where y = y(r)], try an approximate wave 
function, 


vary=1-(2), 


and calculate an approximate eigenvalue k?. 


Hint. To determine p(r) and w(r), put your equation in self-adjoint form (in spherical 
polar coordinates). 


105 25 i 


ANS. k? = GE Kexaot = 55° 


22.4.10 The wave equation for a quantum mechanical oscillator may be written as 


d* (x) 


Far + A x V(x) =0, 
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with A = 1 for the ground state; see Eq. (18.17). Take 
2 
XxX 
1— eas 2< 2 
qe 4 


0, x2 >a? 


Wrrial a 


for the ground-state wave function (with a? an adjustable parameter) and calculate the 
corresponding ground-state energy. How much error do you have? 


Note. Your parabola is really not a very good approximation to a Gaussian exponential. 
What improvements can you suggest? 


22.4.11_ The Schrédinger equation for a central potential may be written as 


hii + 1) 
2Mr2 
The /(/ + 1) term, the angular momentum barrier, comes from splitting off the angu- 
lar dependence. Compare Eq. (9.80) (divide that equation through by —r?). Use the 
Rayleigh-Ritz technique to show that E > Eo, where Epo is the energy eigenvalue of 
Luo = Eouo corresponding to / = 0. This means that the ground state will have / = 0, 

zero angular momentum. 


Lu(r) + u(r) = Eu(r). 


Hint. You can expand u(r) as uo(r) + \72, ciui, where Lu; = Eju;, E; > Eo. 


Additional Readings 


Bliss, G. A., Calculus of Variations. The Mathematical Association of America. LaSalle, IL: Open Court Pub- 
lishing Co. (1925). As one of the older texts, this is still a valuable reference for details of problems such as 
minimum-area problems. 


Courant, R., and H. Robbins, What Is Mathematics? 2nd ed. New York: Oxford University Press (1996). Chapter 
VII contains a fine discussion of the calculus of variations, including soap-film solutions to minimum-area 
problems. 


Ewing, G. M., Calculus of Variations with Applications. New York: Norton (1969). Includes a discussion of 
sufficiency conditions for solutions of variational problems. 


Lanczos, C., The Variational Principles of Mechanics, 4th ed. Toronto: University of Toronto Press (1970), 
reprinted, Dover (1986). This book is a very complete treatment of variational principles and their applications 
to the development of classical mechanics. 


Sagan, H., Boundary and Eigenvalue Problems in Mathematical Physics. New York: Wiley (1961), reprinted, 
Dover (1989). This delightful text could also be listed as a reference for Sturm-Liouville theory, Legendre and 
Bessel functions, and Fourier series. Chapter | is an introduction to the calculus of variations, with applications 
to mechanics. Chapter 7 picks up the calculus of variations again and applies it to eigenvalue problems. 


Sagan, H., Introduction to the Calculus of Variations. New York: McGraw-Hill (1969), reprinted, Dover (1983). 
This is an excellent introduction to the modern theory of the calculus of variations, which is more sophisticated 
and complete than his 1961 text. Sagan covers sufficiency conditions and relates the calculus of variations to 
problems of space technology. 


Weinstock, R., Calculus of Variations. New York: McGraw-Hill (1952), New York: Dover (1974). A detailed, 
systematic development of the calculus of variations and applications to Sturm-Liouville theory and physical 
problems in elasticity, electrostatics, and quantum mechanics. 


Yourgrau, W., and S. Mandelstam, Variational Principles in Dynamics and Quantum Theory, 3rd ed. Philadel- 
phia: Saunders (1968), New York: Dover (1979). This is a comprehensive, authoritative treatment of vari- 
ational principles. The discussions of the historical development and the many metaphysical pitfalls are of 
particular interest. 


CHAPTER 23 


PROBABILITY AND 
STATISTICS 


Probabilities arise in many problems dealing with random events or large numbers of parti- 
cles defining random variables. An event is called random if it is practically impossible to 
predict from the initial state. This includes cases where we have merely incomplete infor- 
mation about initial states and/or the dynamics. For example, in statistical mechanics we 
deal with systems containing large numbers of particles, but our knowledge is ordinarily 
limited to a few average or macroscopic quantities such as the total energy, the volume, 
the pressure, or the temperature. Because the values of these macroscopic variables are 
consistent with very large numbers of different microscopic configurations of our system, 
we are prevented from predicting the behavior of individual atoms or molecules. Often the 
average properties of many similar events are predictable, as in quantum theory. This is 
why probability theory can be and has been developed. 

Random variables are also involved when data depend on chance, such as weather re- 
ports and stock prices. The theory of probability describes mathematical models of chance 
processes in terms of probability distributions of random variables that describe how some 
“random events” are more likely than others. In this sense, probability is a measure of our 
ignorance, giving quantitative meaning to qualitative statements, such as “It will probably 
rain tomorrow” and “I’m unlikely to draw the queen of hearts.” Probabilities are of fun- 
damental importance in quantum mechanics and statistical mechanics and are applied in 
meteorology, economics, games, and many other areas of daily life. 

Because experiments in the sciences are always subject to measurement errors, theories 
of errors and their propagation involve probabilities. Statistics is the area of mathematics 
that connects observations on data samples to inferences about the probable content of 
the entire population from which the sample(s) came. It is an extensive and sophisticated 
branch of mathematics, and in the present text only a few of the most basic concepts can 
be presented. The material found here may be adequate to provide a conceptual basis for 
statistical mechanics, but can at best be an elementary introduction to the ideas needed to 
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23.1 


Chapter 23 Probability and Statistics 


gain maximum information from data-intensive experimental studies such as those arising 
from the study of cosmic rays or the data from high-energy particle accelerators. A more 
complete picture of the role of statistics in physics and engineering can be obtained from a 
number of the texts in the Additional Readings. 


PROBABILITY: DEFINITIONS, SIMPLE 
PROPERTIES 


All possible mutually exclusive! outcomes of an experiment that is subject to chance 
represent the events (or points) of a sample space S. Suppose we toss a coin, and record 
that it lands either “heads” or “tails.” These are mutually exclusive events, so our sample 
space for a single coin toss can be deemed to be spanned by a discrete random variable x, 
with possible values x;, which (based on our experiment, called a trial), will have one of 
the two values x; (for heads) or x2 (for tails). Now suppose, with the same sample space, 
we carry out larger numbers of trials. Some will have the result x; (heads), others x2 (tails). 
It is of interest to define the probability of an outcome in our sample space by the ratio 


number of times event x; occurs 
P(xj)= 





23.1 
total number of trials Gan) 


where it is assumed that the number of trials is large enough that P(x;) approaches a 
constant limiting value. In the event that we are able to enumerate all the possible events 
that produce outcomes in our sample space and can also assume that each event is equally 
likely, we may then define the theoretical probability of an outcome x; as 

number of outcomes x; 


P(xj)= z 
(i) total number of all events 





(23.2) 


An example of the use of this theoretical probability can be illustrated using coin tosses. 
For example, suppose that we toss a coin twice and take our random variable x to be the 
number of heads obtained in a two-toss trial. Our sample S now contains three possible 
values of x, which we designate xo, x1, x2, where we are now letting x; stand for the 
occurrence of i heads in the two tosses. Obviously, the only possible values of x are 0, 1, 
and 2. But we also know that the four possible results of two successive tosses are (heads, 
then heads), (heads, then tails), (tails, then heads), (tails, then tails); these possibilities are 
mutually exclusive and it is reasonable to assume that they are equally likely. Then, using 
Eq. (23.2), we conclude that the probabilities of x2 (two heads) and x9 (no heads) will each 
be 1/4, while the probability of x; (one heads) will be 1/2. 

The experimental definition, Eq. (23.1), is the more appropriate when the total number 
of events is not well defined (or is difficult to obtain) or we cannot identify equally likely 
outcomes. A large, thoroughly mixed pile of black and white sand grains of the same size 
and in equal proportions is a relevant example, because it is impractical to count them all. 
But we can count the grains in a small sample volume that we pick. This way we can check 
that white and black grains turn up with roughly equal probability 1/2, provided that we 
put back each sample and mix the pile again. It is found that the larger the sample volume, 


' This means that given that one particular event did occur, the others could not have occurred. 
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the smaller the spread in probability about 1/2 will be. Moreover, the more trials we run, 
the closer the average of all the individual trial probabilities will be to 1/2. We could even 
pick single grains and check if the probability 1/4 of picking two black grains in a row 
equals that of two white grains, etc. There are lots of statistics questions we can pursue. 
Thus, piles of colored sand provide for instructive experiments. 

The following axioms are self-evident. 


e Probabilities satisfy 0 < P < 1. Probability 1 means certainty; probability 0 means 
impossibility. 


e The entire sample has probability 1. For example, drawing an arbitrary card from a 
deck of cards has probability 1. 


e The probabilities for mutually exclusive events add. The probability for getting exactly 
one head in two coin tosses is 1/4-+ 1/4 = 1/2 because it is 1/4 for head first and then 
tail, plus 1/4 for tail first and then head. 


Example 23.1.1 PROBABILITY FOR A OR B 


Before proceeding with this example, we must clarify the definition of “or.” In probability 
theory, “A or B” means A, B, or both A and B. The specification “A or B but not both” 
is referred to as the exclusive or of A and B (sometimes abbreviated xor). 

What is the probability for drawing a club or a jack from a shuffled deck of cards?” To 
answer this question we need to identify equally probable mutually exclusive events. We 
note that because there are 52 cards in a deck, the drawing of each being equally likely 
(with 13 cards for each suit and 4 jacks), there are 13 clubs including the club jack, and 3 
other jacks; that is, there are 16 mutually exclusive draws that meet our specification out 
of the total of 52, giving the probability (13 + 3)/52 = 16/52 = 4/13. a 


Sets, Unions, and Intersections 


If we represent a sample space by a set S of points, then events meeting certain specifica- 
tions can be identified as subsets A, B,... of S, denoted as A C S, etc. Two sets A, B are 
equal if A is contained in B, denoted A C B, and B is contained in A, denoted B C A. 
The union A U B consists of all points (events) that are in A or B or both (see Fig. 23.1). 
The intersection A / B consists of all points that are in both A and B. If A and B have no 
common points, their intersection is the empty set (which has no elements), and we write 
AN B=¥M. The set of points in A that are not in the intersection of A and B is denoted 
by A — ANC B, thereby defining a subtraction of sets. If we take the club suit in Exam- 
ple 23.1.1 as set A and the four jacks as set B, then their union comprises all clubs and 
jacks, and their intersection is the club jack only. 

Each subset A has its probability P(A) > 0. In terms of these set-theory concepts and 
notations, the probability laws we just discussed become 


0< P(A) <1. 


2Note that these events are not mutually exclusive. 
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FiGURE 23.1 The shaded area gives the intersection AM B, corresponding to the A and 
B event sets; the dashed line encloses A U B, corresponding to the event set A or B. 


The entire sample space has P(S) = 1. The probability of the union A U B of mutually 
exclusive events is the sum 


P(AUB)=P(A)+ P(B), where ANB=M. (23.3) 
The addition rule for probabilities of arbitrary sets is given by the following theorem: 
Addition rule: P(AU B)= P(A)+ P(B)— P(ANB). (23.4) 


To prove Eq. (23.4), we write the union as two mutually exclusive sets: AU B = AU 
(B — BNA), where we have subtracted the intersection of A and B from B before joining 
them. The respective probabilities of these mutually exclusive sets are P(A) and P(B) — 
P(B1 A), which we add. We could also have written AU B= (A —- ANB) UB, from 
which our theorem follows similarly by adding these probabilities: P(A U B) =[P(A) — 
P(AN B)]+ P(B). Note that AN B = BN A. The relationships among these sets can be 
checked by referring to Fig. 23.1. 

Sometimes the rules and definitions of probabilities that we have discussed so far are 
not sufficient, and we need to introduce the notion of conditional probability. Let A and B 
denote sets of events in our sample space. The conditional probability P(B|A) is defined 
to be the probability that an event which is a member of A is also a member of B. To 
understand the need for this somewhat formal definition, consider the following example. 


Example 23.1.2 CONDITIONAL PROBABILITY 


Consider a box of 10 identical red and 20 identical blue pens, from which we remove pens 
successively in a random order without putting them back. Suppose we draw a red pen first, 
event R, followed by the draw of a blue pen, event B. One way to compute P(R, B) is 
to note that our sample space consists of 30 x 29 mutually exclusive and equally probable 
points (each a two-event ordered sequence), of which 10 x 20 meet our specifications, 
leading to the computation P(R, B) = (10 x 20)/(30 x 29) = 20/87. Note that in this 
example, P(R, B) refers to ordered events. 

Another way of making the same computation is to start by noting that the initial drawing 
of a red pen will occur with probability P(R) = 10/30. But now the probability of drawing 
a blue pen in the next round, event B, however, will depend on the fact that we drew a red 
pen in the first round, and is given by the conditional probability P(B|R). Since there are 
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now 29 pens of which 20 are blue, we easily compute P(B|R) = 20/29, and the probability 
of the sequence “red, then blue” is 


1020 20 


P(R, B)=—-— = —, 23.5 
( ) 3029 87 ae 
equal to the result we obtained previously. | 
The generalization of the result in Eq. (23.5) is the very useful formula 
P(A, B) = P(A)P(BIA), (23.6) 


which has the obvious interpretation that the probability that A and B both occur can be 
written as the probability of A, multiplied by the conditional probability P(B|A) that B 
occurs, given the occurrence of A. 

Two observations relative of Eq. (23.6) are in order. First, it can be rearranged to reach 
an explicit formula for P(B|A): 


P(A, B) 
P(A) — 
Second, if the conditional probability P(B|A) = P(B) is independent of A, then the events 


A and B are called independent, and the combined probability is simply the product of 
both probabilities, or 


P(BIA) = (23.7) 


P(A, B)= P(A)P(B), (A and B independent). (23.8) 


If A and B are defined in a way that neither depends on the other (a condition not 
satisfied in Example 23.1.2), we can rewrite Eq. (23.7) as 
P(ANB) 


Example 23.1.3 > SCHOLASTIC APTITUDE TESTS 


Colleges and universities rely on the verbal and mathematics SAT scores, among others, as 
predictors of a student’s success in passing courses and graduating. A research university 
is known to admit mostly students with a combined verbal and mathematics score of 1400 
points or more. The graduation rate is 95%; that is, 5% drop out or transfer elsewhere. Of 
those who graduate, 97% have an SAT score of at least 1400 points, while 80% of those 
who drop out have an SAT score below 1400. Suppose a student has an SAT score below 
1400. What is his/her probability of graduating? 

Let A represent all students with an SAT test score below 1400, and let B represent 
those with scores > 1400. These are mutually exclusive events with P(A) + P(B) = 1. 
Let C represent those students who graduate, and let C represent those who do not. Our 
problem here is to determine the conditional probabilities P(C|A) and P(C|B). To apply 
Eq. (23.9) we need the four probabilities P(A), P(B), P(ANC), and P(BNC). 

Among the 95% of students who graduate, 3% are in set A and 97% are in set B, so 


P(ANC) = (0.95) (0.03) = 0.0285, P(BNC) = (0.95)(0.97) = 0.9215. 
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Among the 5% of students who do not graduate, 80% are in set A and 20% are in set B, so 
P(ANC) = (0.05) (0.80) = 0.0400, =P (BMC) = (0.05)(0.20) = 0.0100. 

Since P(A) = P(ANC)+ P(AN C), and likewise for P(B), we have 
P(A) = 0.0285 + 0.0400 = 0.0685, P(B) =0.9215 + 0.0100 = 0.9315. 


Now, applying Eq. (23.9), we obtain the final results 


P(ANC) _ 0.0285 
P(A) 0.0685 
P(BNC) _ 0.9215 
P(B) 0.9315 


that is, a little less than 42% is the probability for a student with a score below 1400 to 
graduate at this particular university. | 


P(C|A) = ~ 41.6%, 








P(C|B)= = 98.9%; 


As a corollary to the equation for conditional probability, Eq. (23.9), we now compare 
P(A|B) = P(AN B)/P(B) and P(B|A) = P(AN B)/P(A), obtaining a result known as 
Bayes’ theorem: 

P(A) 


P(A|B) = 5 om P(BIA). (23.10) 


Bayes’ theorem is a special case of the following more general theorem: 


If the random events A; with probabilities P(A;) > 0 are mutually exclusive and their 
union represents the entire sample S, then an arbitrary random event B C S has the 
probability 


n 
P(B) =) > P(Aj) P(BIAj). (23.11) 
i=l 

The decomposition law given by Eq. (23.11) resembles the expansion of a vector into a ba- 
sis of unit vectors defining its components. This relation follows from the obvious decom- 
position B = U;(B 2 A;) (this notation indicates the union of all the quantities BM A;, see 
Fig. 23.2), which implies P(B) = }°; P(BN A;) because the components BN A; are mu- 
tually exclusive. For each i, we know from Eq. (23.9) that P(B NM Aj) = P(A;)P(BIAj), 
which proves the theorem. 


Counting Permutations and Combinations 


Counting the events in samples can help us find probabilities; this procedure is found to be 
of great importance in statistical mechanics. 

If we have n different molecules, let us ask in how many ways we can arrange them in 
a row, that is, permute them. This number is defined as the number of their permutations. 
Thus, by definition, the order matters in permutations. There are n choices of picking 
the first molecule, n — 1 for the second, etc. Altogether there are n! permutations of n 
different molecules or objects. 
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as 


FiGURE 23.2 The shaded area B is composed of mutually exclusive subsets of B 
belonging also to A;, Az, A3, where the A; are mutually exclusive. 


B 


Generalizing this, suppose there are n people but only k <n chairs to seat them. In how 
many ways can we seat k people in the chairs? Counting as before, we get 


n! 
er (23.12) 


for the number of permutations of k objects which can be formed by selection from a set 
originally containing n objects. 

We now consider the counting of combinations of objects, where the term combination 
is defined to refer to sets in which the object order is irrelevant. For example, three letters 
a, b,c can be combined, two letters at a time, in three ways: ab, ac, bc. If letters can be 
repeated, then we also have the pairs aa, bb, cc and have a total of six combinations. 
These examples illustrate the fact that a combination of different particles differs from a 
permutation in that the particles’ order does not matter. Combinations may occur with 
repetition or without; the essential point is that no two combinations contain the same 
particles. 

The number of different combinations of n numbered (and thereby distinguishable) par- 
ticles, k at a time and without repetitions, is given by the binomial coefficient 


(3:13) 





mon eet (1) 
k! NK 


To prove Eq. (23.13), we start from the number n!/(n — k)! of permutations in which k 
particles were chosen from n, and divide out the number k! of permutations of the group 
of k particles because their order does not matter in a combination. 

A generalization of the above is a situation in which we have a total of n distinguishable 
(numbered) objects, and we place n; of these into Box 1, n2 into Box 2, etc. We wish to 
know how many different ways this can be done (this is a combination problem because 
the objects in each box do not form ordered sets). A simple way to solve this problem is to 
identify each permutation of the n objects with an assignment into boxes; the first n, of the 
permuted objects is placed in Box 1, the next nz in Box 2, etc. However, permutations 
that differ only in the ordering of objects destined for the same box do not constitute 
different distributions, so the total number of distributions will be n! (the overall number 
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of permutations) divided by n;!, n2!, etc. Thus, our overall formula is 


! 
Ri. ee) = (23.14) 


ni!nz!... 


This quantity is sometimes referred to as a multinomial coefficient; if there were only two 
boxes it reduces to the binomial coefficient. 

For a related problem with repetition, suppose that we have an unlimited supply of 
particles bearing each number from 1 through k. Then the number of distinct ways in 
which n particles can be chosen can be shown to be 


n+k—-1 n+k—-1 
( - )=( k-1 ) (23.15) 


The following example provides a proof of Eq. (23.15). 


Example 23.1.4 ~~ ComBINATIONS WITH REPETITION 


The physical relevance of the situation giving rise to Eq. (23.15) is that it is mathematically 
equivalent to the number of ways that n identical, indistinguishable particles can be placed 
in k boxes. To see that these problems are equivalent, note that the number on each particle 
of Eq. (23.15) can be used to identify the box in which that particle will be placed. 

A simple way to count the possible assignments is to consider the distinguishable ways 
that n indistinguishable particles and k — 1 indistinguishable partitions can be placed in a 
line containing n + k — 1 items. The particles (if any) that occur in the line earlier than 
the first partition are assigned to Box 1; those between the first and second partitions are 
assigned to Box 2, etc., with the particles (if any) occurring later than the (k — 1)th (the 
last) partition are assigned to Box k. Each different placement of the partitions yields a 
unique assignment of particles to boxes, and the number of different partition placements 
is the number of combinations given by the binomial coefficient in Eq. (23.15). a 


In statistical mechanics, we frequently need to know the number of ways in which it 
is possible to put n particles in k boxes subject to various additional specifications. If we 
are working in classical theory, our more complete specification includes the notion that 
the particles are distinguishable, and we refer to the probability computation as that given 
by Maxwell-Boltzmann statistics. In the quantum domain, it is assumed that identical 
particles are inherently indistinguishable; in fact, we cannot even identify them by their 
trajectories, as the notion of path is blurred by the Heisenberg uncertainty principle. This 
indistinguishability leads to the requirement that many-particle states must have symmetry 
under the interchange of identical particles, and in nature we find two cases: Either the 
wave function is symmetric under interchange of the coordinates of a pair of identical 
particles (such particles are said to exhibit Bose-Einstein statistics), or the coordinate 
interchange causes a reversal in the sign of the wave function (the case called Fermi- 
Dirac statistics). The symmetry (or antisymmetry) under particle interchange influences 
the way in which particles can be assigned to states (boxes): In Bose-Einstein statistics 
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any number of indistinguishable particles may be placed in the same box; in Fermi-Dirac 
statistics no box may contain more than one indistinguishable particle. 

Application of the various kinds of statistics in general problems is outside the scope of 
this text; however, the basic case in which we simply count the number of assignments that 
are possible is easily approached. If we have n particles and k available states: 


e In Maxwell-Boltzmann (classical) statistics, the number of possible assignments of 
particles to states is k” (each particle can independently be assigned to any state). 


e In Bose-Einstein statistics, the number of possible assignments is given by Eq. (23.15). 


e In Fermi-Dirac statistics, the number of possible assignments is C): This formula gives 
the number of ways that n of the k states can be selected for occupancy. Note that the 
number of assignments is zero ifn > k, indicating that we cannot make any assignment 
(with a maximum of one particle per state) unless there are at least as many states as 
there are particles. 


Exercises 

23.1.1 Acard is drawn from a shuffled deck. (a) What is the probability that it is black, (b) a red 
nine, (c) or a queen of spades? 

23.1.2 Find the probability of drawing two kings from a shuffled deck of cards (a) if the first 
card is put back before the second is drawn, and (b) if the first card is not put back after 
being drawn. 

23.1.3. When two fair dice are thrown, what is the probability of (a) observing a number less 
than 4, or (b) a number greater than or equal to 4 but less than 6? 

23.1.4 — Rolling three fair dice, what is the probability of obtaining six points? 

23.1.5 Determine the probability P(A M BMC) in terms of P(A), P(B), P(C), P(AU B), 
P(A U C), P(BUC), and P(AU BUC). 

23.1.6 | Determine directly or by mathematical induction (Section 1.4) the probability of a dis- 
tribution of N (Maxwell-Boltzmann) particles in k boxes with N; in Box 1, N2 in 
Box 2,..., Nx in the kth box for any numbers N; > 1 with Nj + No +--+ Ne = 
N, k <N. Repeat this for Fermi-Dirac and Bose-Einstein particles. 

23.1.7 Show that P(A U BUC) = P(A) + P(B) + P(C) — P(AN B) — P(ANC) - 
P(BNC)+ P(ANBNC). 

23.1.8 | Determine the probability that a positive integer n < 100 is divisible by a prime number 
p < 100. Verify your result for p = 3,5, 7. 

23.1.9 Put two particles obeying Maxwell-Boltzmann (Fermi-Dirac, or Bose-Einstein) statis- 


tics in three boxes. How many ways of doing so are there in each case? 
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25.2 


Chapter 23 Probability and Statistics 
RANDOM VARIABLES 


In this section we define properties that characterize the probability distributions of random 
variables, by which we mean variables that will assume various numerical values with 
individual probabilities. Thus, the name of a color (e.g., “black” or “white”) cannot be 
the value assigned a random variable, but we can define a random variable to have one 
numerical value for “black” and another for “white”; the usefulness of our definition may 
depend on the problem we are attempting to solve. 

Having defined a random variable and given its distribution, we are interested in partic- 
ular in its mean or average value, and in measures of the width or spread of its values. 
The width is of particular importance when the random variable represents repeated mea- 
surements of the same quantity but subject to experimental error. In addition, we introduce 
properties that characterize the extent to which the value of one random variable depends 
on (i.e., is correlated with) those of another. 

Random variables can be discrete, as for example those introduced in the previous sec- 
tion to describe the outcomes of coin tosses, or they may be continuous, either inherently 
so (as, for example, the wave function in a quantum mechanical system) or because they 
consist of so many closely spaced discrete points that it is impractical to work with them 
individually. 


Example 23.2.1 DISCRETE RANDOM VARIABLE 


The possible outcomes of the tossing of a die define a random variable X with values 
X1,%2,...,X6, each with probability 1/6; we can denote this by writing P(x;) =1/6,i = 
1...6. 

If we toss two dice and record the sum of the points shown in each trial, then this sum 
is also a discrete random variable, which takes on the value 2 when both dice show 1 with 
probability (1/6)7; the value 3 in either of the two cases in which one die has 1 and the 
other 2, hence with probability (1/6)* + (1/6)* = 1/18. Continuing, the value 4 is reached 
in three equally probable ways: 2 + 2, 3 + 1, and 1 + 3 with total probability 3(1/6)* = 
1/12; the values 5 and 6 are reached with the respective probabilities 4(1/6)* = 1/9 and 
5(1/6)* = 5/36; and the value 7 occurs with the maximum probability, 6(1/6)* = 1/6. 
The value 8 is reached in five ways (6+ 2,5+3,4+4,3+5, 24+ 6), with probability 
5(1/6)* = 5/36, and further increases in x lead to smaller probabilities, finally at x = 12 
reaching probability (1/6)? = 1/36. This probability distribution is symmetric about 
x = 7, and can be represented graphically as in Fig. 23.3 or algebraically as 








at Beats 
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FIGURE 23.3 Probability distribution P(x) of the sum of points when 
two dice are tossed. 


In summary, then, 


e Ifadiscrete random variable X can assume the values x;, each value occurs by chance 
with a probability P(X = x;) = p; > 0 that is a discrete-valued function of the random 
variable X, and the probabilities satisfy }°; p; = 1. 


e We define the probability density f(x) of a continuous random variable X as 
P(x <X <x+dx)= f(x)dx; (23.16) 
that is, f (x)dx is the probability that X lies in the interval x < X <x +dx. For f(x) 
to be a probability density, it has to satisfy f(x) >Oand f f(x)dx = 1. 


e The generalization to probability distributions depending on several random variables 
is straightforward. Quantum physics abounds in examples. 


Example 23.2.2 CONTINUOUS RANDOM VARIABLE: HYDROGEN ATOM 


Quantum mechanics gives the probability |y|? dr of finding a 1s electron in a hydrogen 
atom in volume? d?r, where yw = Ne~"/“ is the 1s wave function, a is the Bohr radius, 
and N = (21a2)~!/2 is a normalization constant such that 


CO 
frwP rasan? [err = nan? = 1. 
0 





3Note that |v (?40r2dr gives the probability for the electron to be found between r and r + dr, at any angle. 
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The value of this integral can be checked by identifying it as a gamma function: 


0° ee 3 3 
; 2 
ferrar = (5) fovea = “T(3) == 
2 8 . 
7 0 


Computing Discrete Probability Distributions 


In Example 23.2.1 the overall probability of a particular value of a discrete random variable 
was computed as a product in which one factor was the number of equally likely ways 
in which that value could be obtained, and the other factor was the probability of each 
mutually exclusive occurrence. This type of computation arises sufficiently frequently that 
we should learn how to deal with it in general. Therefore, consider a situation in which 
N independent events take place (examples of such events include tosses of an individual 
die, selection of a card from a deck, energy state occupied by a molecule, orientation of the 
magnetic moment of a particle), and that each such event has one of a set of m mutually 
exclusive outcomes (e.g., number showing on the die, identity of the card, energy state, or 
magnetic moment orientation). 

We assume that the outcomes x1, x2, ...,X of an individual event will have the respec- 
tive probabilities pj, p2,..., Pm, With py + p2 +---+ Pm = 1 (so that we have included 
all the possible outcomes). Then, we compute the probability that any n; of the events have 
outcome x1, any 2 events have outcome x2, etc.: 


P(ny, N2, +++ Nm) = Bin, N12, +465 Nm) (p1)"! (p2)"” sia e (pm)"™, (23.17) 


where nj +n2+---+nm = N, and B(n1,n2,...,m) is the number of ways that, for each 
i, n; of the events have outcome x;. 

Now B(nj,72,...,%m) iS just the multinomial coefficient encountered earlier; in the 
present context the numbered objects correspond to events numbered from | to N and each 
box corresponds to an individual-event outcome. Thus, our final formula for the probability 
of a distribution defined by 11, no, etc., is 


N! 
P(n1,N2,...,%m) = ———— (p1)" (p2)" .. (Pm). (23.18) 


n!n2!...Nm! 


Mean and Variance 


When we make n measurements of a quantity x, obtaining the values x;, we define the 
average value 


X= — ) x; (23.19) 
a 
j=l 
of the trials, also called the mean or expectation value, where this formula assumes that 
every observed value x; is equally likely and occurs with probability 1/n. This connection 
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is the key link of experimental data with probability theory. This observation and practical 
experience suggest defining the mean value for a discrete random variable X as 


xis = Dis (23.20) 


while defining the mean value for a continuous random variable x characterized by prob- 
ability density f(x) as 


(X) = [sfeoar. (23.21) 


Other notations for the mean in the literature are X and E (X). 

The use of the arithmetic mean x of n measurements as the average value is suggested 
by simplicity and plain experience, again assuming equal probability for each x;. But why 
do we not consider the geometric mean 


1 
Re = Ci Raciiea) in 


or the harmonic mean x, determined by the relation 


1 1 ( 1 1 1 ) 
_ + ones ops 
Xp n\x, x1 Xn 
or the value x that minimizes the sum of absolute deviations |x; — x|? Here the x; are taken 
to increase monotonically. When we plot O(x) = eee |x; —x|, as in Fig. 23.4(a), for 
an odd number of points, we realize that it has a minimum at its central value i =n, while 
for an even number of points E(x) = yar |x; —x| is flat in its central region, as shown in 
Fig. 23.4(b). These properties make these functions unacceptable for determining average 
values. Instead, when we minimize (with respect to x) the sum of quadratic deviations, 





n 
YG —x;)* = minimum, (23.22) 


i=1 








FIGURE 23.4 (a) 4 |x; — x| for an odd number of points. (b) yy |x; — x| for an 
even number of points. 
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setting the derivative equal to zero yields 2 }°; (x — x;) =0, or 
1 7 
x= 7 > xi =x, 
I 


that is, the arithmetic mean. The arithmetic mean has another important property: If we 
denote by v; = x; — x the deviations, then }°; v; = 0, that is, the sum of positive deviations 
equals the sum of negative deviations. This principle of minimizing the quadratic sum of 
deviations, called the method of least squares, is due to C. F. Gauss, among others. 

The ability of a mean value to represent a set of data points depends on the spread of the 
individual measurements from this mean. Again, we reject the average sum of deviations 
yey, lxi — X|/n as a measure of the spread because it selects the central measurement as 
the best value for no good reason. A more appropriate definition of the spread is based on 
the average of the squares of the deviations from the mean. This quantity, known as the 
standard deviation, is defined as 


(23.23) 





where the square root is motivated by dimensional analysis. 
If we square Eq. (23.23) and expand (x; — x)?, written as (x; — (x))?, we get 


n n 
no* = do —2(x) ae +n(x)? 
i=l i=1 


= ((x?) = (xy?) 
Dividing through by n, we obtain the very useful formula, 
— (x)? (23.24) 
Note that these two expectation values are equal only if all the x; have the same value; 


for example, if we have two x;, equal, respectively, to (x) + 5 and (x) — 6, then (x?) = 
(x)? + 6, so the spread in the x; has caused (x?) to increase. 


Example 23.2.3 STANDARD DEVIATION OF MEASUREMENTS 





From the measurements x; = 7, x2 = 9, x3 = 10, x4 = 11, x5 = 13, we extract x = 10 for 
the mean value and, using Eq. (23.23), 








_3)2 4 (—1)2 4. 02 +. 12 4. 32 
=| 2 ee 9a 


5 


for the standard deviation, or spread. |_| 
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There is yet another interpretation of the standard deviation, in terms of the sum of 
squares of measurement differences: 


n n 


SoG — x4) = >>> (x? + x2 - 2xixn) 


i<k i=l k=1 
i) 27,2 2 2 2 2 
= 5[ 2" (x2) — 2n?(x) | = n2o?, (23.25) 


The last step in the above equation made use of Eq. (23.24). 

Now we are ready to generalize the spread in a set of nm measurements with equal prob- 
ability 1/n to the variance of an arbitrary probability distribution. For a discrete random 
variable X with probabilities p; at X = x;, we define the variance 


a=) (@)= (0) Dy (23.26) 
J 


for a continuous probability distribution the definition becomes 
CO 
os / (x — (X))? f@)dx. (23.27) 
—0o 


We now develop some relationships satisfied by random variables: 


1. The variance o7 of a random variable X has the property 
ey — ey, (23.28) 


This formula, previously derived as Eq. (23.24) only for a discrete random variable 
with all x; equally probable, is true in general. The proof is left as Exercise 23.2.3. 
2. Ifrandom variables X and Y are related by the linear equation Y = aX +b, then Y 
has mean value (Y) = a(X) + b and variance o7(Y) = a?a7(X). 
We prove this theorem only for a continuous distribution, leaving the case of a 
discrete random variable as an exercise for the reader. Directly from the definitions, 
we have 


(ee) 


(Y)= [texto pendx = a(x) +b, 


—oo 


where the integral multiplying b simplifies because ff (x)dx = 1. For the variance we 
similarly obtain 


o-(Y)= [iaxto-aix)—P foods = f ae (xy? foods 


= eek), 
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3. 


Probabilities of random variables satisfy the Chebyshev inequality, 
1 
P(\|x — (X)| 2 ko) < Ae’ (23.29) 


which demonstrates why the standard deviation serves as a measure of the spread of an 
arbitrary probability distribution from its mean value (X). We first derive the simpler 
inequality 


p(y>ky< 
cat Gad 


for a continuous random variable Y with values y restricted to y > 0. (The proof for a 
discrete random variable follows along similar lines.) This inequality follows from 


le) K le) 
(Y) = [ xponay= f xroray+ f xroray 
0 0 K 
> / yf (dy > K / f(y)dy = KPUY > K). 
K K 


Next we apply the same method to the positive variance integral, 


o? = / (x — (X))? Fwd > / (x — (X))? Fad 
|x—(X)|>ko 


> 6? / F@)dx = Po? P(|x — (X)| > ko), 
|x—(X)|=ko 


where we have first decreased the right-hand side by omitting the part of the positive 
integral with |x — (X)| < ko and then decreased it further by replacing (x — (X))? 
in the remaining integral by its minimum value, k*a*. We now divide the first and 
last members of this sequence of inequalities by the positive quantity k*0*, thereby 
proving the Chebyshev inequality. For k = 3 we have the conventional three-standard- 
deviation estimate, 


Pie see (23.30) 
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Moments of Probability Distributions 


It is straightforward to generalize the mean value to higher moments of probability distri- 
butions relative to the mean value (X): 


((x _ (xy)*) = om (x; _ (x))* Djs discrete distribution, 
Jj 
oo (23.31) 
(cx = (xy) = i (x — (x) f(Q)dx, continuous distribution. 
—oo 
The moment-generating function 
t2 
(e!X) = fe feos =14+14(X)+ ic See (23.32) 


is a weighted sum of the moments of the continuous random variable X, which is obtained 
by substituting the Taylor expansion of the exponential functions. Therefore, 
d" ( efx ) 


oy (X= . (23.33) 
t=0 dt" t=0 


d(e'*) 


27 tX 
» eee 
dt 


= 2 
1=0 dt 





(X) = 














Note that the moments here are not relative to the expectation value, but are relative to 
x = 0; they are called central moments. 
Example 23.2.4 |= MomeNT-GENERATING FUNCTION 
Suppose we have four cards, numbered from | through 4, from which we draw two at 
random and add their numbers. Letting this sum of the drawn numbers be values of a 
random variable X, we find that X has the following values and respective probabilities 
P(x): 
P)=1/6, P(4)=1/6, P(5)=1/3, P()=1/6, P()=1/6. 


Verifying these probabilities is the topic of Exercise 23.2.1. 
The moment-generating function for this system has the form 


1 
We 7 (e% +e 4 26 + 0% +e"), 
and its first two derivatives are 
1 
M' =< (36% + 4e4 + 100 + 66% 4 Te"), 


1 
M" == (92% 4 168" 4-500! 4: 366° 4 49¢"). 
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Setting t = 0, we get 


80 
(X)=M'O)=5, (X47=M"O)= a 


Thus, the mean of X is found to be 5, and its variance is given by 


80 5 
2 2 2 

= (X*)-— (x)= —25=-. 
a” = (X") — (X) 3 3 


In this example we see that the moment-generating function does (in a systematic way) the 
same thing as direct formation of the moments; in a later example, Example 23.3.2, we see 
a situation in which the use of the moment-generating function provides an opportunity to 
compute moments with rather little computational work. 


Mean values, central moments, and variance can be defined analogously for probability 
distributions that depend on several random variables. We illustrate for the case of two 
random variables X and Y, for which the mean values and the variance of each variable 
take the forms 


CO C&O 


x= f [ sfe.navay, 


—00 —0O 


(23.34) 
y= f [ sfe.naray, 


—0o0 —0O 


o(x)= f [2 Fe ndeay, 


—0o0 —0O 


(23.35) 


or)= f J o-0y Fe. naray. 


—0o —0O 


Covariance and Correlation 
Two random variables are said to be independent if the probability density f(x, y) fac- 


torizes into a product f(x)g(y) of probability distributions of one random variable each. 
The covariance, defined as 


cov(X, Y) = ((X — (X)) (Y — (Y))), (23.36) 
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is a measure of how much the random variables X and Y are correlated (or related): It is 
zero for independent random variables because 


cov(X, Y) = i; &= XN @—W) Fe, Naedy 


_ Oe — (X)) fd a (y — (Y)) g(y)dy 
X) — (X)) ((¥) - (Y) = 


The normalized covariance cov(X, Y)/o(X)o(Y), which has values between —1 and +1, 
is often called correlation. 
In order to demonstrate that the correlation is bounded by 


cov(X, Y) 
~ ao(X)a(Y) — 


> 


we analyze the positive mean value 


Q = (la(X — (X)) +e(¥ — (Y))1*) 
=a? ([X — (X)}°) + 2ac([X — (X)]LY — (¥)]) +7 (IY — (YP) 
=a’o(X)* + 2ac cov(X, Y) +c?a (Y)* > 0. (23.37) 
For this quadratic form to be nonnegative for all values of the constants a and c, its discrim- 
inant must satisfy cov(X, Y)? — 0 (X)*0(Y)* <0, which proves the desired inequality. 


The usefulness of the correlation as a quantitative measure is emphasized by the follow- 
ing theorem: 


The probability P(Y = aX + b) will be unity if, and only if, the correlation 
cov(X, Y)/a(X)a(Y) is equal to +1. 





This theorem states that a +100% correlation between X and Y implies not only some 
functional relation between both random variables but that the relation between them is 
linear. 

Our first step in proving this theorem is to show that P(Y = aX + b) = 1 (meaning 
that Y = aX + b) implies that cov(X, Y)/o(X)o(Y) = +1. For the mean (Y), we simply 
compute 





(Y) = (aX +b) =a(X) +b. 
For the variance, 
o(¥)? = (¥*) —(¥)* = (aX +b)*) — (a(X) +b)” 


= a?(X?) + 2ab(X) +b? — (a(x)? 4 2ab(X) + *) 


=a? ((X*) — (x)?) =ao(X)’, 
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which is equivalent to o (Y) = +ao(X). We also need cov(X, Y), which is 
cov(X, Y) = ((X — (X)) (aX +b) — (a{X) + b))) 


= ((x?) = (x)?) = g02(X) = +0(X)o(¥), 





where the last equality was obtained by identifying ao (X) as to (Y). This result completes 
the first step in our proof of the theorem. 

To complete the proof, we must establish the converse of the relation we have just 
proved, namely that cov(X, Y)/o (X)o(Y) = +1 implies P(Y = aX +b) = 1 for some set 
of values (a, b). We proceed by forming the quadratic expectation value 











[ ox + o(X)Y) — (o(W)x + 0(X)¥)| : 


where the symbol = indicates that we choose a sign opposite to that of the correlation 
cov(X, Y)/a(X)o(Y). Our plan is to show that this expectation value is zero. Since the 
expectation value is that of an inherently nonnegative quantity, we may then conclude that 
o(Y)X =o(X)Y is (almost) everywhere equal to its expectation value, the value of which 
is some constant C. We therefore have 








Y)X-—C 
o(Y)X Fo(X)Y=C, equivalent to yoq ce) 
o(X) 








the linear relation we seek. 
It remains to confirm that the quadratic expectation value vanishes. Rearranging it first 
to the form 





2 
[ ox = (x) FeO - (¥))] 


and then expanding the square, we reach 


(oP = (X))? +o PLY — (¥))? $20 Ko YYX — (XY — (¥))}. 





Making now the substitutions (X — (X))? =oa(X)?, (Y— (Y))? =o(Y)?, and ( (X — (X)) 
(Y —(Y ))) = +o0(X)a(Y), our quadratic expectation value reduces to zero. 





Marginal Probability Distributions 


It is sometimes useful to integrate out (i.e., average over) one of the random variables in a 
multivariable distribution. When we do so, we are left with the probability distribution of 
the other random variables. For a two-variable distribution, we can eliminate either of the 
two variables: 


F@)= i fGpidy, e €6)= / Fo whde, (23.38) 


and analogously for discrete probability distributions. When one or more random variables 
are integrated out, the remaining probability distribution is called marginal, the name 
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motivated by the geometric aspects of projection. It is straightforward to show that these 
marginal distributions satisfy all the requirements of properly normalized probability dis- 
tributions. 

Here is a comprehensive example that illustrates the computation of probability distri- 
butions and their mean values, variances, covariance, and correlation. 


Example 23.2.5 REPEATED DRAWS OF CARDS 


This example deals with independent repeated draws from a deck of playing cards. To 
make sure that these events stay independent, we draw the first card at random from a 
bridge deck containing 52 cards and then put it back at a random place and reshuffle the 
deck. Now we repeat the process for a second card. Let’s define the random variables: 


e X =number of so-called honors, that is, tens, jacks, queens, kings, or aces; 


e Y=nwumber of twos or threes. 


In a single draw the probability of Event a (drawing an honor) is pg = 20/52 = 5/13, 
while the probability of Event b (drawing a two or three) is pp = 2(4/52) = 2/13. The 
probability of Event c (drawing anything else) is pe = (13 — 5 — 2)/13 = 6/13. Since that 
exhausts all the mutually exclusive possibilities, we havea +b+c= 1. 

In two drawings, it is possible to draw zero, one, or two honors (1.e., x = 0, 1, or 2). 
Likewise, we may draw zero, one, or two cards of value 2 or 3 (i.e., y = 0, 1, or 2). But 
because we are only drawing two cards, we have the additional condition 0 < x + y <2. 

The probability function P(X = x, Y = y), which we will write in the simpler form 
P(x, y), 1s given by a formula of the type presented in Eq. (23.18), with N (the number 
of events) equal to 2 and with the three individual-event probabilities pg, pp, and p-. The 
number of events a is x, the number of events b is y, and therefore the number of events c 
is 2— x — y, and, by Eq. (23.18), 


Pay= (Pa) (pb)? (pe)? * 


xlyiQ-x-y)! 


2! S\ (2 Poy 
=n (3) (4) (3) ’ 


with 0 < x + y < 2. More explicitly, P(x, y) has the following values: 


(23.39) 





6 \? 5 6 60 
P(0O,0)=(—), Pd,0)=2-—-—=—, 
0,9) (3) (1, 0) 13.13 132 
P2,0)=(- ° re ees 

NB)? "13° -13«132’ 
P(0,2) = 2) P(1,1)=2 ae ees 
ame ey ae T3113 ~—C«B 
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The probability distribution is properly normalized. Its expectation values are given by 


(X)= $0 x P(x, y) = P(,0) + PC, 1) +2P(2, 0) 








Os<x+ys2 
_ 0 20 4/5 15180 5 10 
~ 132" 732 NB) ~ 132 ~ 13 


and 


(Y)= So yP(,y)=P@,1)+ PU, 1) +2P©,2) 








O<x+ys2 
a PR Dag (2 a 
~ 732" 132 ia) 12. 1g A 
The values 2p, and 2 pp are expected because we are drawing a card two times. The vari- 
ances are 
10\? 
2 
X= —~—) Pq, 
ros: (: =) (x,y) 
O<x+y<2 


2 


16\7 
[P(,0)+ Pd, 1)]+ (3) P(2,0) 


= 2) (P00 P(O,1)+ P(O,2 (4) 


_ 10° -64+3°-80+167-5* 4°-5-169 80 
7 134 iach ae 





4\2 
ee, (»-3) P(x, y) 


O<x+y<2 
2 


Ae 9 \2 39 
= (-3) [P(0,0) + P(1,0) + P(2,0)] +(3) [P(0, 1) + PC, 1)] +(3) P(0,2) 


4-1? +97. 44422?.2?  11-4-169 44 
= 134 ~ 134 432° 
The covariance is given by 
10 4 10-4 67 10-9 24 
cov(X, Y= De (: =) (> =) COM> Tr 7327 RF TR 
O<x+y<2 
10:22 4 3-4 60 3:9 20 16:4 5% — 20 
132 132,132, 132,132,132, 132,132 1.3?” 


Therefore, the correlation of the random variables X, Y is given by 











cov(X,Y) _ 20 _ 1/5 
a(X)o(Y) 8/5-11 £22V1i 





= —0.3371, 
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which means that there is a small (negative) correlation between these random variables, 
because if an honor is drawn, that drawing is not available to yield a 2 or a 3, and vice 
versa. 

Finally, let us determine the marginal distribution, 


2 
Peay Pa): 


y=0 
or explicitly, 


2 2 2 
P(X =0)= P(0,0) + P(O, »+P0.2=(2) +5+(4) =(5) , 





60 20 80 
+ —— 


P(X =1)=P(1 Pd,)=—; = — 
(X=) =PU,0)+P0D=5 +i =a 


5 \2 
P(X =2)= P(?2,0)=(—} , 

( ) = P(2,0) (3) 
which is properly normalized because 


64+ 80+25 169 
132 ~~ 132 


The mean value and variance of X can be computed from the marginal probabilities: 


P(x =0) + P(X =1)+ P(X =2)= =‘ 





2 
(X) = x P(X =x) = P(X = 1) +2P(X =2)= a a ee 














2 2 ; 
per 13 13 13 
2 2 2 2 2 2 2 
10 10 8 3 80 16 5 
oe a: =) =) ( =) (5) +(3) at(3) (3) 
_ 80-169 80 
134 132° 
These data agree with our earlier computations of the same quantities. a 


Conditional Probability Distributions 
If we are interested in the distribution of a random variable X for a definite value y = yo 


of another random variable, then we deal with a conditional probability distribution 
P(X =x|Y = yo). The corresponding continuous probability density is f(x, yo). 


Exercises 


23.2.1 Verify the probabilities for the outcomes of the two-card draws in Example 23.2.4, and 
by direct computation of the mean and variance check the results given in that example. 
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23.2.2 


23.2.3 


23.2.4 


23.2.5 


23.2.6 


23.2.7 


23.3 


Show that adding a constant c to a random variable X changes the expectation value 
(X) by that same constant but not the variance. Show also that multiplying a random 
variable by a constant multiplies both the mean and variance by that constant. Show 
that the random variable X — (X) has mean value zero. 


Using the definition given in Eq. (23.27) for the variance o* of a continuous random 
variable, show that 


a = (x) = (x). 


A velocity vj = xj;/t; is measured by recording the distances x; at the corresponding 
times ¢;. Show that x/f is a good approximation for the average velocity v, provided 
that all the errors are small: |x; — x| « |x| and |t; —f| < |f]. 


Redefine the random variable Y in Example 23.2.5 as the number of fours through nines. 
Then determine the correlation of the X and Y random variables for the drawing of two 
cards (with replacement, as in the example). 


The probability that a particle of an ideal gas travels a distance x between collisions 
is proportional to e~*// dx, where f is the constant mean free path. Verify that f is 
the average distance between collisions, and determine the probability of a free path of 
length! >3f. 


Determine the probability density for a particle in simple harmonic motion in the 
interval -A <x <A. 


Hint. The probability that the particle is between x and x + dx is proportional to the 
time it takes to travel across the interval. 


BINOMIAL DISTRIBUTION 


In this and the next two sections, we explore specific random variable distributions that are 
of importance both in physics and in the mathematical theories of probability and statistics. 
The topic of the present section is the binomial distribution, which typically occurs in the 
study of repeated independent trials of random events. 


Example 23.3.1 REPEATED TOSSES OF DICE 


What is the probability of three sixes in four tosses, all trials being independent? Getting 
one six in a single toss of a fair die has probability a = 1/6, and getting anything else has 
probability b = 5/6 with a + b = 1. Let the random variable S = s be the number of sixes. 
In four tosses, 0 < s < 4. The probability distribution P(S = s) is given by the product of 
the two possibilities, a’ and b*~’, times the number of ways that s sixes can be obtained 
from four tosses. This number is given by Eq. (23.18), and our probability is 


4! 5p4-s 4 5p4—s 
PS=) Gq at? =(_ Jaret. (23.40) 
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We can now check that our probability is properly normalized by verifying that the sum of 
P(S=s) for all s adds to unity. From properties of the binomial coefficients, we find 


4 


4 
» ({ao =(a+b)= (; " 2) =1. (23.41) 
Ss 6 6 


s=0 
Writing out the cases of Eq. (23.40) explicitly, we have 
fO)=b', fd)=4ab’, f2)=6a*b’, f(3)=4a°b, f(4)=a", 


so we can answer our original question: The probability of three sixes in four tosses is 


3 
4a3b =4 oes 
6) 6 4-34’ 


which is fairly small. | 


This case dealt with repeated independent trials, each with two possible outcomes of 
constant probability p for a hit and g = 1 — p for a miss, and it is typical of many ap- 
plications, including practical issues such as the random instances of defective products. 
The generalization to S = s successes in n trials is given by the binomial probability 
distribution: 


n! -s n Ss on—s 
P(S=s)= ——— p*q" = Pq : (23.42) 
s(n —s)! Ss 


Figure 23.5 shows histograms for cases with 20 trials and various hit probabilities p. 


Example 23.3.2 USE OF MOMENT-GENERATING FUNCTION 


If we view our probability distribution as the result of adding together n random variables 
S;, each having the value s; = 1 with probability p and the value s; = 0 with probability 
q, we can use the moment-generating function of Eq. (23.32) to obtain more information 
about the binomial distribution. We write 


(eS) = (ef Sit S2t--+ Sn) = (ef 51) (ef 52) sia (ef Sn) = [ety] , (23.43) 


where we have used the fact that the trials are independent to write (e’°) as a product of 
single-trial expectation values, all of which are identical. 

We continue by evaluating (e’*'), which is an average for the two values s; = 1, with 
probability p, and s; = 0, with probability g. We get 


(e!!) = pe! + ge = pe! +4, (23.44) 
so Eq. (23.43) reduces to 


(eS) = (pe'+q)". (23.45) 
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FIGURE 23.5 Binomial probability distributions for n = 20 and p = 0.1, 0.3, 0.5. 


Note that the fact the trials were independent enabled us to obtain the moment-distribution 
function without enumerating all the many-trial possibilities. 

Now that we have (e’*) we can differentiate it, as in Eq. (23.33), to obtain moments of 
our distribution. Using 


d(e’s) 
ot 





=npe'(pe' +q)"", 


= np, 
t=0 





a tS 
(= Vs fen= 


a? (e's) t t =| 2 28 t =5 
sar Re We +q)"" +n(n— 1)pre"(pe'+q)"~, 
a? (e!S) 

(S7) = Lifts) =a | Me tae 1)p*, 





we obtain, applying Eq. (23.28), 
o7(S) = (S?) — (S)? =np +n(n—1)p* —n*p* 
=np(— p)=npq. 
For a given n, we see that the variance is largest when p = g = 1/2. This behavior is 


apparent in Fig. 23.5, where we see that the distribution broadens as p is increased from 
0.1 to 0.3 to 0.5. | 
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Exercises 


23.3.1 


23.3.2 


23.3.3 


23.3.4 


23.3.5 


23.4 


Show that the variable X = x, defined as the number of heads in n coin tosses, is a 
random variable and determine its probability distribution. Describe the sample space. 
What are its mean value, the variance, and the standard deviation? Plot the probability 
function P(x) = [n!/x!(n — x)!]2~” for n = 10, 20, and 30 using graphics software. 


Plot the binomial probability function for the probabilities p = 1/6, g =5/6, andn =6 
throws of a die. 


A hardware company knows that the probability of mass-producing nails includes a 
small probability p = 0.03 of defective nails (usually without a sharp tip). What is the 
probability of finding more than two defective nails in its commercial box of 100 nails? 


Four cards are drawn from a shuffled bridge deck. What is the probability that they 
are all red? That they are all hearts? That they are honors? Compare the probabilities 
when each card is put back at a random place before drawing the next card, with the 
probabilities when the cards are not replaced in the deck. 


Show that for the binomial distribution of Eq. (23.42), the most probable value of x 
is np. 


POISSON DISTRIBUTION 


The Poisson distribution is often used to describe situations in which an event occurs 
repeatedly at a constant rate of probability. Typical applications involve the decay of 
radioactive samples, but only in the approximation that the decay rate is slow enough that 
depletion in the population of the decaying species can be neglected. Other applications 
of interest include so-called Poisson noise, where fluctuations in a low rate of arrival of 
particles at a detector cause statistically predictable fluctuations in the detector signal. 

The Poisson distribution can be developed by considering the probabilities that varying 
numbers of events are detected over an interval during which events occur at a constant 
rate of probability. The essential features of the development are that it assumes that (1) the 
event rate is small enough that there will be observationally accessible intervals in which 
at most one event occurs (i.e., one can consider intervals containing either zero or one 
event), and (2) the total number of events is small enough that it is useful to model their 
occurrence by a discrete probability distribution. 

Let’s proceed by defining the probability P,,(t) that exactly n events occur in a time 
t, and that the probability of one event occurring in a short time interval dt will be udt, 
where jz is a constant such that wdt < 1. This time interval dt is therefore short enough 
that we can neglect the possibility that more than one event occurs within it. Based on this 
hypothesis, we can set up a recursion relation for P,,(t) by considering the two following 
mutually exclusive possibilities for the occurrence of n events in a time d + df: (1) that 
n events occur during a time ¢ and no events occur in a subsequent time interval dt, and 
(2) that n — 1 events occur during the time ¢ and one event occurs during the subsequent 
interval dt. We therefore write 


P(t + dt) = Py(t) Po(dt) + Pr—1(t) Pi (dt). 
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Then, inserting P; (dt) = dt and Po(dt) = 1 — P, (dt) and dividing through by dt, we 
get, after minor rearrangement, 


dPy(t) _ P(t ap dt) ~ Py(t) 
— dt 
As a first step in solving this recursion relation, we note that for n = 0 it simplifies 
(because the possibility involving P,,—; does not exist) to 


dPo(t) _ 
i —uPo(t). (23.47) 


This equation, with initial condition Po(0) = 1 (meaning that it is certain that no events are 
observed in an interval of zero length), has solution Po(t) = e~“". Our solution informs us 
that the probability that no events have occurred before time t decays exponentially with 
t, at arate dependent on the magnitude of jz. From this starting point and the further initial 
conditions P,,(0) = 0 for n > | (again, no detection of events occurs during an interval of 
zero length), the recursion relation can be solved to yield 


P(t) = wee (23.48) 





= W€Py-1(t) — WPy(t). (23.46) 





Equation (23.48) can be checked by substituting it into the recursion formula, Eq. (23.46), 
and by verifying that it satisfies the initial conditions P, (0) = 5,0. 

Equation (23.48) is taken as the definition of the Poisson distribution, regarded as a func- 
tion of the quantity jzt. Replacing «zt by 4, we write the Poisson-distribution probabilities 
given for a discrete random variable X in the standard form, 


u” 
p(ny=—e", X=n=0,1,2,.... (23.49) 
n) 
We can check that the probabilities in Eq. (23.49) are properly normalized by noting that 


>, LU" /n! evaluates to e“. An example of a Poisson distribution is given in Fig. 23.6. 
The mean value and variance of a Poisson distribution are easily calculated: 








= ener L, (23.50) 

2 not = ss ia pw” 8 
oe ae “= la ae = si l=H + pL, (23.51) 
*=(X") — (XP = et Dw =p. (23.52) 


The moments can also be calculated from the moment-generating function 


n 


n=0 


Recall that the procedure for obtaining moments is to differentiate with respect to ¢ and 
read out the derivatives evaluated at t = 0. 
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FiGURE 23.6 Poisson distribution, 4 = 5. 


Relation to Binomial Distribution 


A Poisson distribution becomes a good approximation of the binomial distribution for a 
large number n of trials and small probability p ~ /n, with yz held constant. 


Theorem: Jn the limit n > 00 and p > 0 so that the mean value np —> w stays finite, 
the binomial distribution becomes a Poisson distribution. 


To prove this theorem, we need to find the large-n limit of the binomial distribution 
formula, Eq. (23.42). To do so, we apply Stirling’s formula, in the form n! ~ /27n(n/e)” 
for large n. See Eq. (12.110). For the quotient of the two n-dependent factorials occurring 
in Eq. (23.42), we have (keeping s finite while letting n — oo): 


or ~()"(5) : ~@) (4) : aa): (1+) : : 


The factor in the final expression raised to the power n — s is, in the limit of large n, 
an expression of value e* (in fact, it is, with n — s changed to n, one of the often-used 
definitions of the exponential). The final result is 











n! 
a—s! 


n’, (23.53) 


We use a similar defining expression for the exponential to evaluate the factor g”~* in 


Eq. (23.42). Writing g’~* = (1 — p)”* and replacing p by its limiting value p = p/n, 
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FIGURE 23.7 Comparison of binomial distribution (NV = 80, p = 0.1), wide bars, and 
Poisson distribution (44 = 8), narrow bars. 


we have 


n =. 
g’S =(1— p35 ~ (1 = =) (1 = -) ~ ewe, (23.54) 
n n 
Inserting the large-n limiting values from Eqs. (23.53) and (23.54) into the formula for the 
binomial distribution, we reach 
! S Ss 
P(S =s) = ——-—__pg""5 ~ = pie tw Eee, (23.55) 
s\(n —s)! s! s! 
where in the last step we have combined n* and p’ into y*. 

Equation (23.55) establishes our theorem, and thereby completes the connection be- 
tween the Poisson and binomial distributions. This result, which becomes valid in the limit 
of a large number of trials, each of small probability, is sometimes referred to as an exam- 
ple of the laws of large numbers. A comparison of the binomial and Poisson distributions 
is presented as Fig. 23.7. 


Exercises 


23.4.1 


Radioactive decays for long-lived isotopes are governed by the Poisson distribution. In 
a Rutherford-Geiger experiment, the numbers of emitted a particles are counted in each 
of n = 2608 time intervals of 7.5 seconds each. In Table 23.1 n; is the number of time 
intervals in which i particles were emitted. Determine the average number A of particles 
emitted per time interval, and compare the n; of Table 23.1 with np; computed from 
the Poisson distribution with mean value i. 





23.4.2 


23.4.3 


23.4.4 


23.4.5 


23.4.6 


23.5 
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Table 23.1 Data for Exercise 23.4.1 





i > 0 1 2 3 4 5 6 7 8 9 10 
nj —> 57 203 383 525 532 408 273 139 45 27° 16 





Derive the standard deviation of a Poisson distribution of mean value ju. 


The number of a-particles emitted by the decay of a radium sample is counted per 
minute for 40 hours. The total number is 5000. How many 1-minute intervals are ex- 
pected with (a) 2, and (b) 5 a-particles? 


For a radioactive sample, 10 decays are counted on average in 100 seconds. Use the 
Poisson distribution to estimate the probability of counting 3 decays in 10 seconds. 


?38U has a half-life of 4.51 x 10° years. Its decay series ends with the stable lead isotope 
206Pb. The ratio of the number of 2°°Pb to 738U atoms in a rock sample is measured as 
0.0058. Estimate the age of the rock assuming that all the lead in the rock is from the 
initial decay of the 7°U, which determines the rate of the entire decay process, because 
the subsequent steps take place far more rapidly. 


Hint. This is not a Poisson distribution problem, but is an application of the decay law 
N(t) = Ne~”, where A, the decay constant, is related to the half-life T by T = In2/d. 


ANS. 3.8 x 10’ years. 


The probability of hitting a target in one shot is known to be 20%. If five shots are fired 
independently, what is the probability of striking the target at least once? 


GAUSS’ NORMAL DISTRIBUTION 


The bell-shaped Gauss distribution is defined by the probability density 





i = 
Oe lea 
oO 


r 
: 23.56 
0 ( 752 ); CO <x <0O ( ) 


with mean value ju and variance o”. In part because it represents continuous limits of 
both the binomial and Poisson distributions, it is by far the most important continuous 
probability distribution and is displayed in Fig. 23.8. 
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FIGURE 23.8 Gauss normal distribution for mean value zero and various standard 
deviations (marked by circles). Curves are labeled by h = 1/0 V2. 


It is properly normalized because, substituting y = (x — w)/o V2, we obtain 


1 7, 1 7, 2 7, 
—(x—p)?/207 4 / ayn i. ay 
e dx = — e’-dy= e?- dy=1. 
oV/2n / Nes - Nes = 
—00 —0o 0 


To check the mean value, we can make the substitution y = x — yw, and find that 





[o.@) [o.@) 
= X THM (x= p)?/202 4 y —y?/202 7. 
(X)-pw= ——e sed dx = ; —e”? dy =0, (23.57) 
2. ov 20 o ov 20 


showing that (X) = uw. The zero result in Eq. (23.57) occurs because the integrand is odd 
in y, so the integral over y > 0 cancels that over y < 0. A check that the variance of this 
normal distribution is indeed o? is the topic of Exercise 23.5.1. 

We can compute conditional probabilities for the normal distribution. In particular, 
making for convenience the substitution y = (x — (X))/o, 


. 2 ee )- 
(|X —(X)| > ko) = P( ———— > k) = P| > &) 


5) [o,@) 4 [o,@) 
= [2 ferra=4 i e-* dz = erfe(k/V2), 
1 1 
k 


k//2 


we can evaluate the integral for k = 1,2, 3, and thus extract the following numerical rela- 
tions for a normally distributed random variable: 


P(|X —(X)| >o0) 0.3173, P(|X — (X)| > 20) + 0.0455, 
P(\X — (X)| = 30) © 0.0027. 


(23.58) 
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It is interesting to compare the last of these quantities with Chebyshev’s inequality, which 
gives 1/9 for the probability that an event falls further than 30 from the mean. The 1/9 
applies to an arbitrary probability distribution, and is in strong contrast to the much 
smaller 0.0027 given by the 30-rule for the normal distribution. 


Limits of Poisson and Binomial Distributions 


In a special limit, the discrete Poisson probability distribution is closely related to the 
continuous Gauss distribution. This limit theorem is another example of the laws of large 
numbers, which are often dominated by the bell-shaped normal distribution. 


Theorem: For large n and mean value 1, the Poisson distribution approaches a Gauss 
distribution. 


To prove this theorem, in the limit n + oo, we approximate for large n the factorial 
in the Poisson probability p(n) by Stirling’s asymptotic formula, n! ~ /2nz(n/e)”, and 
choose the deviation v = n — yw from the mean value as the new variable. We let the mean 
value jz approach oo and treat v/s as small, but assume v~/, to be finite. Substituting 
n= (+0, we obtain 

n 


—Hh 
inp) =in(4 < ) = mine = w= Inna — nnn +n 
n! 





=(u+v)In (4) tu-—-InJ/2(ut+ov)z 
=(u+uv)In (: - —) tvu—InJ/2r(u+v). 


We next expand the first logarithmic term in powers of u/( + v), reaching 


oO t 


In p(n) = — > —— +u—In/2x(u +0). (23.59) 


t-l 
fay tle + v) 





The first two terms of the tf summation yield nonvanishing contributions in the large- 
limit; further terms vanish because the power of v in the numerator is less than twice that 
of jz in the denominator. Replacing wz + v by pz, Eq. (23.59) reduces to 


en, (23.60) 





v : 1 
In p(n) on In/27, equivalent to p(n) a 
This is a Gauss distribution of the continuous variable v with mean value 0 and standard 
deviation o = ,/j. 
In another special limit, the discrete binomial probability distribution is also closely 
related to the continuous Gauss distribution. This limit theorem is yet another example of 
the laws of large numbers. 
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Theorem: Jn the limit n > ov, with p a finite trial probability such that the mean value 
np — 00, the binomial distribution becomes a Gauss normal distribution. Recall from 
Section 23.4 that, when np > [L < ©, the binomial distribution becomes a Poisson 
distribution. 


Instead of the large number s of successes in n trials, we use the deviation v = s — 
pn from the (large) mean value pn as our new continuous random variable, under the 
condition that as n > 00, |v| < pn (so v/n — 0) but v”/n is finite. Thus, we replace s by 
pn+v andn—s by gn — v in the factorials of the formula for the binomial distribution, 
Eq. (23.42). Writing now W(v) as our probability distribution in the large-pn limit, we 
apply Stirling’s formula as we have done several times before, obtaining initially 





pq” Synt1/2,—nts+(n-s) 

V2n(pn + v)+!/2(gn — vsti?” 

Next we factor out the dominant powers of n and cancel powers of p and q to find 
1 py \7(pntet1/2) yp \~4n-vt1/2) 

ee ee 

Taking the logarithm of W(v) and expanding in powers of v, we retain only the terms 

through v’, yielding 


InW(v) =—In./2mpqn 


2) eS a fe ae ee Oe Ga62) 
n\2p 2q n2 \4p?— 4q? n\2p 24 ; 


Setting v/n to zero, noting that 


(A+ 2)=nt4- 1 
2p 24 2pq  2pq’ 


and dropping all terms v’ with t > 2, we obtain our large-n limit 


W(v) = 





(23.61) 


Wiv) = 











1 2 
W(v) = ————e "7? 4", 23.64 
O° Taxpan® ais 
which is a Gauss distribution in the deviations s — pn, with mean value 0 and standard 
deviation o = ./npq. The large values assumed for both pn and qn (and the discarded 
terms) restrict the validity of the theorem to the central part of the Gaussian bell shape, and 
exclude the tails. 


Exercises 
23.5.1 Show that the variance of the normal distribution given by Eq. (23.56) is 0”, the symbol 
in that equation. 
23.5.2 Show that Eq. (23.62) can be obtained by manipulation of the formula Eq. (23.61) for 
W(v). 
23.5.3 With W(v) the expression in Eq. (23.62), show that the expansion of In W(v) in powers 


of v leads to Eq. (23.63). 





23.5.4 


23.5.5 


23.6 
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What is the probability for a normally distributed random variable to differ by more 
than 40 from its mean value? Compare your result with the corresponding one from 
Chebyshev’s inequality. Explain the difference in your own words. 


An instructor grades a final exam of a large undergraduate class, obtaining the mean 
value of points M and the variance o~. Assuming a normal distribution for the number 
M of points, he defines a grade F when M < m — 30/2, D when m —30/2< M < 
m—o/2, Cwhenm—o/2<M <m-+o/2, Bwhenm+o/2 <M <m-+30/2, and 
A when M > m+ 30/2. What is the percentage of As, Fs; Bs, Ds; and Cs? Redesign 
the cutoffs so that there are equal percentages of As and Fs (5%), 25% Bs and Ds, and 
40% Cs. 


TRANSFORMATIONS OF RANDOM VARIABLES 


We have already encountered some elementary transformations involving random vari- 
ables: In Section 23.2 we observed that a random variable Y = aX + b will have mean 
value (Y) = a(X) + b and variance o*(Y) = a*o7(X). Here we consider more general 
transformations, with particular focus on continuous probability distributions. 

First, consider a simple change of random variable from X to Y, where y = y(x). If 
the probability distribution of X is f(x)dx, then the contribution at x to some quantity 
M(y) is 


P{M|y(x)]}dx = M[y@)] f @)dx. (23.65) 
But we may wish to express the probability in terms of the distribution of Y, writing 


P[M(y)]dy = M(y)g(y)dy, (23.66) 


for y evaluated at the point corresponding to x, i.e., y = y(x). To make these equations 
consistent, it is necessary that 

dx 
ay 


g(y)dy= f(x)dx, or g(y)= fix(y)] ; (23.67) 


For example, if y = x7, then dx/dy = 1/(2x) = y—!/?/2 and g(y) = fi /y)y1/7/2. 
Let’s now address the transformation of two random variables X, Y into U(X, Y), 
V(X, Y). Again we treat the continuous case. If 


u=u(x,y), v=v(x,y), x=x(u,v), y=yu,v) (23.68) 
describe the transformation and its inverse; integrals of the probability density will trans- 
form by formulas that include the Jacobian of the transformation (see Section 4.4). The 


transformed probability density becomes 


glu, v) = f(xtu, v), yu, v)) |, (23.69) 
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where the Jacobian is 
Ox Ox 
_a@.y) _| au dv 
d(u, Vv) dy oy |’ 
du av 





(23.70) 


This is a generalization of Eq. (23.67). 


Addition of Random Variables 


Let’s apply this analysis to a situation in which Z is the sum of two random variables X 
and Y, or Z = X + Y. We transform to new variables X and Z, so the transformation is 
X=x,Z=x+y;orx =x, y=z-—x. The Jacobian for this transformation is 








Ox Ox 

ax az 10 
~ | a(z—x) a(z—x)} f-11] 

Ox Oz 


If our original probability distribution was f(x, y), it therefore transforms into g(x, z) = 
f(x, z—x). We are usually interested in the marginal distribution in Z, obtained by inte- 
grating over x, and 


P(Z=z)=g8(z)= / f(x, z—x)dx. (23.71) 


In the oft-occurring case that X and Y are independent random variables, so f(x, y) = 
fi(x) fo(y), Eq. (23.71) assumes the form 


g(z) = / fix) fa(z — x)dx, (23.72) 
which we recognize as a Fourier convolution, see Eq. (20.68): 
a(z)= / fi) faz — x)dx = V2 (fi, * f2)(2). (23.73) 


Equation (23.72) gives us a general formula whereby we can obtain the distribution of 
Z=X+Y from the distributions of independent variables X and Y, while Eq. (23.73) 
shows that it may be useful to consider the use of Fourier transforms for evaluating the 
integral. In fact, the moment-generating function, Eq. (23.32), is (if ¢ is replaced by it) 
proportional to the Fourier transform of the probability density, and 


(el) = i: e!™* f(x)dx = V2n f! (t) (23.74) 


is known as the characteristic function in probability theory. 
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Applying the Fourier convolution theorem, Eq. (20.70), we therefore write 
[P(Z=2)' Sa") =Vv20f/ OF O, (23.75) 


showing that we can obtain g(z) as the inverse Fourier transform 


gaz | eet (1 (1) fF (dt. (23.76) 


Connection with statistics texts will be improved by restating Eqs. (23.75) and (23.76) 
using the characteristic function notation. Equation (23.75) is equivalent to 


(el!Z) = (OY), = (el! X) (elt Vy (23.77) 


Equation (23.76) states that g(z) is the distribution that corresponds to (e!"~); since Fourier 
transforms have inverses it can be assured that such a distribution exists. 


Example 23.6.1 = Avvition THEOREM, NORMAL DISTRIBUTION 


A good example of the analysis for a random variable Z = X + Y is provided when X and 
Y are taken to be Gauss normal distributions with zero mean value and the same variance. 
This situation corresponds to a relationship known as the addition theorem for normal 
distributions. 


Theorem: /f the independent random variables X,Y have identical normal distribu- 
tions, that is, the same mean value and variance, then Z = X +. Y has normal distribu- 
tion with twice the mean value and twice the variance of X and Y. 


To prove this theorem, we assume without loss of generality that the normal distributions 


each have variance o = 1, so, from Eq. (23.56), each has the form 


1 yt 
f= eo BY / 2 
; JV 20 


From Eq. (20.18) and the translation formula, Eq. (20.67), we find the Fourier transform 
of f(x) to be 


| oe 2 
Ti) itu,—t?/2 
(t) = —~e''¥e ; 
f JV 20 
Now, applying Eq. (23.75), we have 
1 : 2 
T 2itu,—t 
(t) = —— ee.” 
° 20 


Taking the inverse transform, noting that the complex exponential shifts the origin by an 
amount 214, we get 


1 
Jat 


which shows that the mean and variance of Z are twice those of X and Y, so the theorem 
is satisfied. a 


2 
eo EB) ae 





a(z)= 
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Multiplication or Division of Random Variables 


Consider now the product Z = XY, taking X, Z as the new variables. This corresponds to 
the transformation x = x, y = z/x, with Jacobian 








ox ox 
_| ox Oz | ! 0 eS 
0(z/x) 0(z/x) —z/x? L/x Pa 
ax Oz 
so the marginal distribution of Z is given by 
r d 
z\dx 
g(z)= / f(x =)=. (23.78) 
x7 |x| 
—0o 


If the random variables X, Y are independent with densities f|, fo, then 


g(2) = / Awp(2)S (23.79) 


[x] 


Finally, let Z = X/Y, taking Y, Z as the new variables, corresponding to x = yz, y= y, 
with Jacobian 


d(yz) d(yz) 














dy az zy 
J = —t = —y, 
ay ay | fro 
dy Oz 
and the probability distribution of Z is given by 
[o.@) 
e@= f fozslyldy. (23.80) 


—oo 


If the random variables X, Y are independent with densities f|, fo, then 


ee i AOD AO Say: (23.81) 


Gamma Distribution 


Up to this point the only specific continuous probability distribution we have introduced 
is the Gauss normal distribution. However, if we make a change in the random variable of 
that distribution from X to Y = X?, there will result a different distribution of significant 
utility, known as a gamma distribution. Let’s start the present discussion with the now 
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quite familiar normal distribution of mean value zero and variance o”. It has probability 
distribution 
1 2/992 
fQ@)= aan oa 
As indicated just after Eq. (23.67), a transformation to write the distribution in terms of 
y =x? leads us to 
e—)/207 yl 


oV2x 2 


However, this equation does not take into account the fact that y must be restricted to 
nonnegative values, and that the same value of y will be encountered for two different 








gs(y= 


values of x, namely x = +,/y and x = —,/y. These considerations make a more proper 
and complete formula for g(y) the following: 
0, y<0, 
8(V) = 7 yo l/2¢-y/20? (23.82) 
Oovie fe? 77° 


This expression for g(y) is normalized (it must be, due to the way in which it was 
obtained). However, it is instructive to check, which is best done by changing to a new 
variable z = y/20”, in terms of which we have 


TG) _ 


[o,e) 1 [o,e) 
dz=—= |] z'Pe*dz= 1; 
[eo z Al e “dz Te 


where we have identified the integral as T (4) and also noted that [' (5) = /7. 

Because the functional form of g(y) is essentially that of the integrand of the integral 
representation of the gamma function, the distribution given by g(y) is called a gamma 
distribution, and in particular, a gamma distribution with parameters p = 1/2 (the argu- 
ment of the gamma function) and o? (the variance of the underlying normal distribution). 
We generalize to gamma distributions of general p and o: 


0, y <0, 

8(P, 95 Y) =) yp-le-y/20? (23.83) 
ism ee 0. 
(Qo2)PI(p)’ > 


The gamma distribution often appears in contexts where the random variables involved 
need to be added together. It is therefore useful to take note of the Fourier transform of 


g(p,o;y): 





1 
[g(p,o) |’ = (23.84) 


Jon (1 —2io2t)P 
Using the characteristic-function notation as introduced at Eq. (23.74), and defining X to 
be a (p,o) gamma-distributed random variable, Eq. (23.84) takes the alternative form 
1 


itX, _ 
(e = Gaie2np (23.85) 
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Example 23.6.2 ADDITION OF GAMMA-DISTRIBUTION RANDOM VARIABLES 


Let’s compute the distribution of a random variable Y = X; + X2, where X; has gamma 
distribution g(p1,0;x,) and X2 has gamma distribution g(p2, 0; x2). Note that both X, 
and X> have the same variance. 
Using Eq. (23.77) for the characteristic function of X; + Xz and Eq. (23.85) to evaluate 
(e'Xi), we get 
1 
(1 — 2ia2t)PitP2° 


Recognizing this result as the characteristic function for a gamma distribution of parameter 
P= pi+ p2, we see that 


Cae — 


8(y) = 8(P1 t+ p2,9; y). 
Generalizing this result to an arbitrary number of X;: 


The probability distribution for a sum of gamma-distributed random variables Xj of 
parameters pj; but all of the same o is a gamma distribution for that o and with p = 


ar, Pj- 


A corollary to the above is obtained if we consider the probability distribution of a sum of 
the form 


Z= y Xx}, (23.86) 


where the X; are Gauss normal distributions, all with the same variance o”. Because the 
quantities being see are squares of random variables, it is useful first to make the 
substitutions Y; = xj; 2 changing each distribution of X; to that of a Y; gamma distribution 
with p = 1/2, ie finally combining the n gamma dictibulions to form the distribution 
of Z; the result will be a gamma distribution with p = n/2 and the common value of o. 
Summarizing, 


The probability distribution for the sum of the squares of n Gauss normal random 
variables with a common variance o2, as in Eq. (23.86), will be a gamma distribution 
with parameters p =n/2 and the common value of o. 


| 
Exercises 
23.6.1 Let X,, X,..., X, be independent normal random variables with the same mean x and 
variance o”. Show that 
i xX i / n— x 
o./n 


23.6.2 


is normal with mean zero and variance 1. 


If the random variable X is normal with mean value 29 and standard deviation 3, what 
can you say about the distributions of 2X — 1 and 3X + 2? 





23.6.3 
23.6.4 


23.6.5 


23.6.6 


23.6.7 


23.6.8 


23.7 
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For a normal distribution of mean value m and variance oc”, find the distance r such 
that half the area under the bell shape is between m — r and m +r. 


If (X), (Y) are the average values of two independent random variables X, Y, what is 
the expectation value of the product XY? 


If X and Y are two independent random variables with different probability densities 
and the function f(x, y) has derivatives of any order, express (f(X, Y)) in terms of 
(X) and (Y). Develop similarly the covariance and correlation. 


Let f(x, y) be the joint probability density of two random variables X, Y. Find the 
variance o7(aX + bY), where a,b are constants. What happens when X, Y are 
independent? 


Obtain an addition theorem for the distribution of a random variable Y = X,; + X2 
where X, and X2 are Gauss normal distributions with different mean values jz; and 


variances oP. 


ANS.  Y is normal with mean jz; + 42 and variance a; + Ge. 


Show that the Fourier transform of the gamma-distribution probability density, 
Eq. (23.83), has the functional form given in Eq. (23.84). 


STATISTICS 


In statistics, probability theory is applied to the evaluation of data from random experi- 
ments or to samples to test some hypothesis because the data have random fluctuations 
due to lack of complete control over the experimental conditions. Typically one attempts 
to estimate the mean value and variance of the distributions from which the samples 
derive, and to generalize properties valid for a sample to the rest of the events at a pre- 
scribed confidence level. Any assumption about an unknown probability distribution is 
called a statistical hypothesis. The concepts of tests and confidence intervals are among 
the most important developments of statistics. 


Error Propagation 


When we measure a quantity x repeatedly, obtaining the values x;, or select a sample for 
testing, we can compute 


1 n 1 n 
= 2 =\2 
x=-) Xj o=-) (xj — x), 
n* n* 
j=1 j=l 


where x is the mean value and o? is the variance, a measure of the spread of the points 
about the mean value. We can write x; = x + e;, where e; is the deviation from the mean 
value, and we know that 2 ej =0. 

Now suppose we want to estimate the value of a known function f(x) based on these 
measurements x;; that is, we want to assign a value of f given the set fj = f(x;). 
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Substituting x; = x + e; and forming the mean value 
= 1 1 _ 
Fa) fG)=—) fa+e) 
J j 
= 1 tps 1 Hes 2 
=f@+-/@O Lats f Ore ~ 
Jj 
1 
= f(x) + 50 f"@) fone, (23.87) 


we obtain the average value f as f(x) in lowest order, as expected. But in second order 
there is a correction given by the variance with a scale factor f” (x)/2. 

It is also of interest to determine the spread predicted for the values of f(x;). To lowest 
order, this is given by the average of the sum of squares of the deviations. Approximating 
fj as ft f'(x)ej, we get 


1 7 
PN=— dif — fre OP = 2 = 1 OP (23.88) 
J 


In summary, we may formulate somewhat symbolically 





f@+0)=f@)+ f' Ro 


as the simplest form of error propagation for a function of one measured variable. 

Fora function f(x;, yx) = fjx of two quantities x; =x + uj, yk = y + Ug, where the x; 
and y, are measured independently of each other and we have r values of j ands values 
of k, we obtain similarly 


F-4DY f= FG. y)+- iat hat 
k 


j=lk=l 


~ f (X, 5). (23.89) 


The error in a is seen to be second-order in the wu; and vx. In writing Eq. (23.89) we have 
used the relations }) , vj =), vg = 0 and have introduced the definitions 





of 
fe == 1 by 


; 23.90 
Ox |x,y7 ( ) 








x,y 


The variance of f is (to first order) 


2 2 
o(f)= =», eee — f= ~ wih + ue fy)? = iE a B ee 
5 pk j k 


Wed Fe 
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where we have dropped the zero cross term >?) ,UjUk = 2); Uj 22; Ue. Noting that 
a uy =r anid yu, = sia we reach the final result 


1 : 
o*(f)= ar dh — fy = fro? + fro? (23.91) 


Symbolically, the error propagation for a function of two measured variables may be 


summarized as 
fete yee) H fave. a2 + vas, 


Example 23. 7.1 REPEATED MEASUREMENTS 

















As an application and generalization of the result given in Eq. (23.91), let’s consider what 
happens when we regard the mean of n measurements x; as a function 


X= f(%1,X2,...+ Xn) = (41 +2 +++++%n)/n 


of the variables x,,...,X,, each with variance 0”. Then we have fr; =1/n for each j, 
and, according to Eq. (23.91), 
n n o2 o2 
P@=) fr = aaT- (23.92) 
j=l j=l 


This result indicates that the standard deviation of the mean value, o (x), will decrease 
with the number of repeated measurements, approaching zero as o/,/n. It is important to 
recognize the distinction between the variance of the mean value, denoted o7(x), and the 
corresponding quantity for the individual measurements (denoted o7). 

If we refer now to our earlier result that the sum of n identically distributed, Gauss 
normal random variables is also a normal random variable with a variance equal to n times 
that of each variable (see Example 23.6.1), and note also that division of the sum by n (to 
form the mean) causes division of the variance by n?, as discussed following Eq. (23.28), 
we find a result identical to that developed in the present example, but with the additional 
feature that the mean is also normally distributed. a 


The arithmetic mean x will, because of the distribution in the x;, differ from the true 
(but unknown) value jz, with and x differing by some amount a, or x = «+a. However, 
as the number n of measurements increases, we expect that the error a will tend to zero, 
and that, according to Example 23.7.1, we can estimate @ to fall in the range (—o/,/n < 
a <o/,/n). We can refine this estimate by considering the spread of the x; measured with 
respect to the true value 4, meaning that we compute the variance using the average of vs, 
where v; = x; — 4, instead of that of ae where, as before, e; = x; — x. Calling this version 


2 


of the variance s“, we write 


ee i— ie 
v= — = — Dig tayr=— let’, (23.93) 
j=l j=! j=l 
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where the term linear in e; vanishes because >? ; e; = 0. Inserting now an estimate of , in 
the form a* ~ s*/n (this approximation good to first order), Eq. (23.93) rearranges to 


1 [i 
s? (: - ~) =-) ¢;, (23.94) 
ae 


n . 
j=l 


ri -— x£)2 
s=, Esa (23.95) 


The quantity s is referred to as the sample standard deviation. Equation (23.95) is not 
well defined when n = 1, but that is not an issue because a single data point is insufficient 
to determine a spread. The presence of  — 1, in contrast to the factor n in Eq. (23.23), 
allows for the probable error in x, and is known as Bessel’s correction to the standard 
deviation formula. 


equivalent to 


Fitting Curves to Data 


Suppose we have a sample of measurements y; taken at times t;, where the time is known 
precisely but the y; are subject to experimental error. An example would be snapshots 
of the position of a particle in uniform motion at the times t;. Our statistical hypothesis, 
motivated by Newton’s first law and the initial condition that y = 0 when f = 0, is that y(t) 
satisfies an equation of the form y = at, where the constant a is to be determined from the 
measurements. 

To fit our equation to the data, we first minimize the sum of the squares of deviations 
S= Uj (atj — y;)* to determine the slope parameter a, also called regression coefficient, 
using the method of least squares. Differentiating S with respect to a we obtain 


2S (at _ yy )tj = 0, 
J 
which we can solve for a: 
vi Li Yj 
a= ‘ 
yur 
ivi 
Note that the numerator is built like a sample covariance, the scalar product of the variables 


t, y of the sample. As shown in Fig. 23.9, the measured values y; do not as a rule lie on the 
line. They have the sample standard deviation, computed from Eq. (23.95), 


_ [Ly Oj - ty)? 
s= aa 


Alternatively, suppose that the y; values are known precisely while the t; are measure- 
ments subject to experimental error. As suggested by Fig. 23.10, in this case we need to 





(23.96) 
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0 


FiGURE 23.9 Straight line fit to data points (t;, yj) with t; known, y; measured. 











FiGurRE 23.10 Straight line fit to data points (¢;, y;) with y; known, t; measured. 


interchange the roles of t and y and to fit the line t = by to the data points. We minimize 
S= yj by; = tj)’, setting dS/db = 0, and find similarly the slope parameter 


_ ty 
= - 
aes 


In case both t; and y; have errors (we take ¢ and y to have the same measurement 
precision), we have to minimize the sum of squares of the deviations of both variables. It is 
convenient to fit to a parameterization f sina — ycosa = 0, so y/t = sina/ cosa = tana, 
meaning that a is the angle the fitting line makes with the t-axis (see Fig. 23.11). Our task 
will therefore be to determine a. We also see from Fig. 23.11 that the fitting line has to 
be drawn so that the sum of the squares of the distances d; of the points (¢;, y;) from the 
line becomes a minimum. To find d;, we rotate our coordinate system the angle ~, which 
moves (tj, y;) to (t,, y,) according to 


j 
ti cosa sina tj 
y; —sina cosa} \ y; 





b (23.97) 
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FIGURE 23.11 (a) Straight line fit to data points (t;, y;). (b) Geometry of 
deviations u;,v;,dj;. 


which yields dj = y; = —t; sina + y; cosa, the (signed) distance to the line at angle a. 
The minimum of the square of the distances to the line is found from 


d 
ae >. d; =2 xe sina + yj cosa)(—t; cosa@ — y; sina) 
j j 


= sina cosa > (07 = y;) _ (cos? a — sin? a) ys ty; =9, 
J j 


which can be reduced to 


2>° ty; 
tan2a = 2h iti (23.98) 


yj (t; a y;) 


This least-squares fitting is appropriate when the measurement errors are unknown, as it 
gives equal weight to the deviation of each point from the fitting line. 

Finally, if we have information that permits the assignment of different probable errors 
to different points, we have the alternative of making a “weighted” least-squares fit called 
a chi square fit, which we discuss in the next subsection. 


The x? Distribution 


Given a set of u; corresponding to values t; of an independent variable (which is not nec- 
essarily a time), we seek to fit these data to a function u(f, a1, a2,...), where the a; are 
parameters that are adjusted to optimize the fit. The optimization is carried out by mini- 
mizing a weighted sum of the squares of the deviations, where the weights are controlled 
by the assumed standard deviations o; of the respective measurements u ;. The quantity to 
be minimized is traditionally labeled x7 and called chi-square, and its precise definition is 


n 


a : 2 
r= (4 te?) . (23.99) 


zz 
j=l 7 
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where n is the number of data points. This quadratic merit function gives more weight to 
points with small measurement uncertainties oj. 

The key assumptions adopted to analyze the probability distribution corresponding to the 
chi-square fit are (1) that each data point is an independent Gauss normal random variable 
Xj; with zero mean and unit variance, with the unit variance assured by the presence of the 
o; in each term, and (2) that the distribution x’ is related to the X j by 


n 
ce) 2. (23.100) 
j=l 


Making a chi-square fit requires no knowledge of statistics; we simply apply standard 
analytical or numerical methods to minimize x* for our set of data points. On the other 
hand, a knowledge of the chi-square probability distribution will be needed to determine 
whether we are getting the expected quality from our chi-square fit. In particular, if we 
wish to determine the probability of the occurrence of our data set based on the chi-square 
distribution (and possibly assess the adequacy of our assumptions regarding the individual- 
point variances a7), we must undertake further analysis. 

Our earlier discussion of transformations of random variables included the analysis of 
sums of normally distributed X ; of the form given in Eq. (23.100), with the result de- 


veloped in Example 23.6.2. Specializing to the case at hand, we note that x7 will have a 
gamma probability distribution with parameters p = n/2 and o = 1, so 


ylt/2)-lp-y/2 


LG (23.101) 


ax? =y)= 


Plots of g(y) for several values of n are given in Fig. 23.12. 


Chi-square densities, n=2, 3, 4, and 5 

















FiGURE 23.12 x” probability density g,(y). 
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Table 23.2 x? Distribution 





n v=08 v=0.7 v=05 v=04 v=03 v=0.2 v=0.1 





0.064 0.148 0.455 0.708 1.074 1.642 2.706 
0.446 0.713 1.386 1.833 2.408 3.219 4.605 
1.005 1.424 2.366 2.946 3.665 4.642 6.251 
1.649 2.195 3.357 4.045 4.878 5.989 7.779 
2.343 3.000 4.351 5.132 6.064 7.289 9.236 
3.070 3.828 5.348 6.211 7.231 8.558 10.645 


Donk WN 





Note: A data set with n degrees of freedom will have probability v that its value of x2 exceeds 
the tabulated value. 


It is also useful to note the moment-generating function for this distribution: 
1 
on 23.102 
(= Goa (23.102) 


a result that follows directly from Eq. (23.85). Differentiating Eq. (23.102), we find 


d= 2); a 
(x2) =] am, (7) = SZ] _ =n +2), 23.103) 
and therefore 
07(x7) = ((x°)?) = (x7)? = n(n + 2) — 1? = 2n. (23.104) 


These results suggest that typical data with realistically assigned individual-measurement 
variances would yield a value of x? comparable to the number of data points. However, 
by calculating 


CO 


P(x? > yo) = [ sonar, (23.105) 
YO 


where g(y) is the distribution in Eq. (23.101), we can obtain for any yo the probability 
that a data set would have a larger spread than that corresponding to x* = yo. Because 
it is somewhat laborious to compute the integral in Eq. (23.105), its values are generally 
obtained by table lookup. A short table of these chi-square data are given in Table 23.2. 

Before closing this subsection, we need to deal with the fact that our random variables 
X; do not really have zero mean values if the function u(t;,...) was chosen based on 
the available data and therefore was not exact. By reasoning similar to that involved in 
the discussion leading to Eq. (23.95), it can be shown that if a chi-square fit involves 
n data points and the determination of r parameters, the effective number of degrees of 
freedom is n — r, with the implication that the inexactness of the fit in r degrees of freedom 
corresponds to a chi-square distribution with n replaced by n — r. 

Finally, it is worth pointing out that the x analysis does not really test the assumptions 
that the data points are independent normal random variables. If these assumptions are not 
approximately valid, it is unlikely that good chi-square fits can be achieved. 
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Example 23.7.2 CHI-SQUARE Fit 


Let us apply the x? function to a straight-line fit of the type shown in Fig. 23.9, with the 
three measured points and their individual standard deviations, written as (tj, uj + oj), 
having the values 








(1,0.840.1), (2,1.5+0.05), (3,2.740.2). 
Before proceeding to the chi-square fit, we first fit a line assuming the points to be equally 
weighted, corresponding to using Eq. (23.96) for the slope a. We find 
1(0.8) + 2(1.5) +3(2.7) 11.9 
az = 
1? 4+ 2? 4 32 14 
The sample variance of the points from the line is 











= 0.850. 


o*= ; ([0.8 — 1(0.850)]? + [1.5 — 2(0.850)]? + (2.7 — 3(0.850)]") = 0.0325, 


and the variance of a is 


2 
da \* tj 2 0.0325 
o*(a)=) (=) c=) Le) ot = 2 = = = 0.00232. 
Ou j Dik ran 14 


J J 











Thus, the unweighted fit yields a = 0.850 + 0.00232 = 0.850 + 0.048. 
Turning now to the chi-square fit, we next minimize 


ray (“oy 


: Oj 
j J 





with respect to a. This process yields 


on tj (uj — ati) 
da 2 0 





or 


In our case 
1(0.8) 21.5) — 3(2.7) 
0.12" 0.052 © 0.22 _ 1482.5 
P 2? 3 1925 ee 
0.12 0.052 + 0.22 
The value we obtained for a is dominated by the middle point, the smallest o;; if that point 
were the only one used, we would have gotten a = 1.5/2 = 0.75. The variance of a, o7(a), 
is now 


2. 
da \? tj/o? 1 1 
2 2 . J 2 
o-(a)= —= I -o7'= ——____* = = = 0.000519. 
@=2(5n) % (345 “1S Bjot 1925 


J J 














The chi-square estimate of a is therefore a = 0.770 + 0.000519 = 0.770 + 0.023. 
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Our fit has for x” the value 


> [0.8—1(0.770)}* | [1.5—2(0.770)]? [2.7 — 3(0.770)]* 
_— 0.1 + p02 0.22 cic 
Our problem involves three points and one parameter, and therefore its chi-square distri- 
bution has two degrees of freedom and, according to Eqs. (23.103) and (23.104), has a 
mean value of 2 and a variance of 4. Our value of x7, 4.533, is significantly larger than 
the mean value of the distribution and therefore describes a data set with more spread than 
would normally be expected for the stated values of oj. We can obtain a more quantitative 
measure of the probability that x” would be at least as large as our value by comparing 
with the entries in Table 23.2. Using the row of the table for n = 2, we see that the prob- 
ability of getting a spread larger than that of our present data is quite small, only slightly 
above 0.1. a 





Student ¢ Distribution 


The Student ¢ distribution (sometimes just called the ¢ distribution) is designed for use 
with small data sets for which the variance is unknown. This distribution was first de- 
scribed by W. S. Gosset, who published his work under the pen name “Student” because 
his employer, the Guinness brewery, would not permit him to publish it under his own 
name. 

Gosset considered the probability distribution of a random variable T, of the form 


_ Yi 








T= ; 23.106 
ST ( ) 
For the applications under consideration here, 

1 n _ 
Y= ; = : 

= So Xj -w=X-y, (23.107) 

j=l 
n 

sy x. (23.108) 

J=1 


Here the X; are a set of n independent Gauss normal random variables, each of the same 
unknown variance o7. The quantity jw is the (unknown) value of the mean of X. An 
important feature of Gosset’s choice for T is that (as we shall shortly show) its proba- 
bility distribution /f,,(t) is independent of the variance of the X;. 

The procedure for obtaining the probability distribution of T depends on the fact that 
Y and S are independent random variables. That is so, but proof is beyond the scope of 
the present abbreviated discussion. We start by noting that S is a gamma distribution, 
with probability distribution g(n,o; 5), as given in Eq. (23.85). Next, we proceed to the 
distribution of U = ./S/n, which we denote h(u). Making a change of variable from s to 
nu”, and observing that ds = (ds /du)du, we find 


h(u) = g(n,o;nu*)(2nu). (23.109) 
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This is the probability distribution of the denominator of T. To get the distribution of the 
numerator, we note that Y is a normal distribution with variance o7/n (see Eq. (23.92)), 
and mean zero. It, therefore (see Eq. (23.56)), has the distribution we denote r(y), of the 


form 
1H) = eee (23.110) 


The numerator, Z = Y ./n, will therefore have a distribution k(z), where z = y./n, so 
k(z) =r(z/V/n)(dy/dz) = aa (23.111) 
the presence of the factor ,/n causes the numerator to have variance o*. Finally, we use 
the formula for the ratio of two independent distributions, Eq. (23.81), to obtain 
[o.@) 
fi) = [ kunnwrn du. (23.112) 
0 


The integration only extends from zero to infinity because the gamma distribution in h(u) 
is only nonzero for positive u. Inserting expressions for the quantities in Eq. (23.112), 


“WP 20” on, o;nu~)(2nu*)du 


1 CO 
w= Ta | 


1 
a VIno2 Qn/2gn 





lo) 

1 

mare fee ty eter mu? 
0 





[o,@) 
2 ny (n+1)/2 1 —u(t?-+n)/207 0 
_ 2 ie 23.113 
ottl = (5) ra | : “ 

0 


To complete the evaluation, we change variables in the integral to z = u?(t? + n)/207, 
thereby making the integral identifiable as a gamma function, so 


CO [o.@) 
2 \ (n+1)/2 

fern na _ ; (at ) [oretae 
n 


tf oe? cae n+l ah 
~ 2\24n as 


Inserting this result into Eq. (23.113) and simplifying, we note that the instances of o 
entirely cancel, and we are left with 


n+1 
r( 5 ) 12 —(n+1)/2 
fn) = T= (1 + ) : (23.115) 
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T density for n=2, 10, 20, and 30 











FiGuRE 23.13 Student t probability density f, (1) for n = 2, 10, 20, and 30. 


Equation (23.115) is the probability density for the ¢ distribution with n degrees of free- 
dom. This equation shows that we have achieved the desired result, namely that the dis- 
tribution T is independent of the variance of the input random variables X;. Since it is 
our intention to use the ¢ distribution for the reduction of experimental data of unknown 
variance, we have achieved our current objective. Figure 23.13 shows densities f,(t) for 
several n; an important feature of these curves is that they depend very weakly on n. 


Confidence Intervals 


A confidence interval for a random variable X is the range within which x will fall, not 
with certainty but with a high probability, the confidence level, which we can choose. If 
X has a probability distribution f(x), the confidence interval for probability p will be 
the range of x, usually symmetrically centered about some value xo, that contains the 
fraction p of its probability distribution. If this range is bounded by x9 — dx and x9 + dx, 
it is customary to write that x = x9 + dx with (100p)% confidence. If, for example, a 
computed value of x is 0.50 and 90% of its probability distribution falls between x = 0.40 
and x = 0.60, we say that x has the value x = 0.50 + 0.10 with 90% confidence. 

Confidence intervals are usually found by what is called the pivotal method, which 
involves relating the variable for which we desire a confidence interval to a known proba- 
bility distribution. The identification and selection of pivotal quantities is in general outside 
the scope of this text, but for a Gauss normal random variable with zero mean, a suitable 
pivot is its ¢ distribution. Referring to Eq. (23.106), this means we can estimate a confi- 
dence interval for Y (the deviations of the observed mean X from its true value) from the 
equation 








Poe a ee 





(23.116) 
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where 7 is the random variable corresponding to the ¢ distribution and S is the single value 
obtained by inserting the observed values of the X; into Eq. (23.108). We use Eq. (23.116) 
by inserting into it the range of T that corresponds to a total probability p, calculating 
therefrom the corresponding range of Y. Note that we do not insert a probability distribution 
for S; we use the value of S arising from our data. 

The distribution of T is an even function of ¢ with a maximum at t = 0, as is obvious 
from Eq. (23.115) and Fig. 23.13, and our confidence interval for T will naturally be cen- 
tered about zero. Therefore, a confidence interval of probability p will correspond to a 
symmetric range of t, (—Cp <t <+Cp), such that 


P(-Cy <t <+Cp) =p. 
Because f(t) is even, we also have 
P(—00 <t <+Cp) = 5 +(-Cp <t <Cp)/2, 
which is equivalent to 


P(—00 <t<+Cp)= a 
Because of the frequent need to use values of C corresponding to various values of p and 
degrees of freedom n, these C values have been tabulated and appear in many statistics 
texts. A short table is included here (Table 23.3). 
Given a confidence interval for 7, we may insert it into Eq. (23.116), which when solved 
for 4 becomes 


p. (23.117) 





ux re ai a (23.118) 


From the limiting values for T, we get the corresponding range for jz, which is valid with 
the probability of the T range. Note that except for the range of T, all the quantities on 
the right-hand side of Eq. (23.118) are to be computed from our sample data. In particular, 
we need the mean value X for our sample and the standard deviation of our data points, 
o = ./S/n. Note further that, as with the chi-square distribution, when measured data 
are used to generate the sample mean and sample standard deviation, the appropriate t 
distribution to use for n data points is that with n — 1 degrees of freedom, and in using 


Table 23.3 Student ¢ Distribution 














Dp n=1 n=2 n=3 n=4 n=5 
0.8 1.38 1.06 0.98 0.94 0.92 
0.9 3.08 1.89 1.64 1.53 1.48 
0.95 6.31 2.92 2.35 2.13 2.02 


0.975 12.7 4.30 3.18 2.78 2.57 
0.99 31.8 6.96 4.54 3.75 3.36 
0.999 318.3 22.3 10.2 TAT 5.89 





Note: Entries are the values C in en Sfn(t)dt = p, where fy (t) is 
given in Eq. (23.115), with n the number of degrees of freedom. 
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Eq. (23.118) it is customary to take o as the sample standard deviation, as defined in 
Eq. (23.95). These points are explained more fully in several of the Additional Readings. 


Example 23, 7.3 CONFIDENCE INTERVAL 


Suppose we have the following random data from a population that can be assumed to have 
a Gauss normal distribution: 


7.12 4.95 6.18 5.69 2.90 8.47, 


and we wish to determine 90% and 95% confidence intervals for the population mean. 

Since we have neither the population mean nor variance, but have assumed the popula- 
tion distribution to be normal, we can use the ¢ distribution as just outlined. As a prelimi- 
nary to doing so, we need to calculate the sample mean and standard deviation. Since we 
have six data points, the number of degrees of freedom will be n = 5. We have 


X = (7.12 + 4.95 + 6.18 + 5.60 + 2.90 + 8.47) /6 = 5.885, 
1/2 
1 2 2 
o=| = ((7.12 — 5.885) +--+ + (8.47 —5.885)") | = 1.9035. 
Considering first the 90% confidence interval that corresponds to the range (—Co9 < t < 


Co0) with p = (1+ p)/2 =0.95, we read from Table 23.3 the value Coo = 2.02. Thus, 


_ (2.02)(1.9035) 
WS 


For 95% confidence, we need Cos, again for n = 5. This time, p = 0.975, so Cos = 2.57, 
and 








fe =5.885 J =5.885+ 1.720 (90% confidence). 


_ (2.57)(1.9035) 
NS 


A few final observations are in order. First, we see that by demanding an increase in 
the confidence level, the interval probably containing the true mean becomes wider. Note 
that at high confidence levels the probable width can become much larger than the sample 
standard deviation. Finally, note that even the confidence intervals are sample-dependent. 
Other data from the same population could generate intervals of different widths. Perhaps 
oversimplifying, these analyses show that there is no way of converting probability data 
into significant statements that have complete certainty. a 








fu =5.885 4 =5.885+2.188 (95% confidence). 


Exercises 


23.7.1 


Let AA be the error of a measurement of A, etc. Use error propagation to show that 


a(C)\* _ (a(A)\?_ (a(B)\* 
a) 


holds for the product C = AB and the ratio C = A/B. 











23.7.2 


23.7.3 


23.7.4 
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Find the mean value and standard deviation of the sample of measurements x; = 6.0, 
x2 = 6.5, x3 = 5.9, x4 = 6.1, x5 = 6.2. If the point x6 = 6.1 is added to the sample, how 
does the change affect the mean value and standard deviation? 








Carry out a x* analysis of the fit corresponding to Fig. 23.10 using the same points as 
in Example 23.7.2, but with the errors now associated with the t; rather than the y,. 


Using the data from Exercise 23.7.2 (including the point x6), find the 90% and 95% 
confidence intervals for the mean of the x;. 
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spinors, 792—793 
vector model, 786-788 
angular momentum formulas, 781-782 
angular momentum operators, 774-776 
angular momentum formulas, 781—782 
exercises, 782—784 
ladder operators, 776-779 
spinor, 779-781 
annihilation operator, 882 
anti-Hermitian, 277 
anti-Hermitian matrices, 108, 319 
antiderivation, 239 
antisymmetric stretching mode, 323 
antisymmetric tensor, 208, 216 
arbitrary probability distribution, 1157 
arbitrary-vector technique, 167 
Argand diagram, 56, 56f, 57, 470, 492 
arithmetic mean, 1137 
associated Laguerre equation, 894 
associated Laguerre polynomials 
generating function, 892-895 
associated Legendre equation, 425, 716, 741-743 
exercises, 753-756 
magnetic field of current loop, 748-753 
orthogonality, 746-748 
parity and special values, 746 
associated Legendre functions, 425, 744-745, 
TASt 
associated Legendre polynomials, 743-744 
associative, 96, 97 
asymptotic expansions, 581 
asymptotic forms, 692-693 
of Hankel functions, 688-690 
properties of, 693-695 
exercises, 695-698 
of an integral representation, 690-692 
Stokes’ method, 688 
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asymptotic series, 577 
Bessel functions, 691 
cosine and sine integrals, 580-582 
definition of, 582-583 
exercises, 583-584 
exponential integral, 578-580 
integral representation expansion, 690-692 
overview, 577 
asymptotic values, Bessel functions, 690, 703 
atomic interaction integral, 72 
average value, 1136 
axial Green’s function, 464 
axial vectors, 136, 215 


B 
Baker-Hausdorff formula, 114 
band-pass filter, 1002/ 
baryons, 852, 853¢ 
multiplets, decomposition of, 858-861, 859f, 
860f 
basis expansion, adjoint, 281-282 
basis functions, 252, 253 
Bayes’ theorem, 1130 
Bernoulli equation, 330, 378 
Bernoulli numbers, 556, 562t 
contour of integration for, 563f 
exercises, 566-567 
generating-function, 560 
overview, 560-565 
polynomials, 565-566 
Riemann zeta function, 564 
Bernoulli polynomials, 565-566 
Euler-Maclaurin integration formula, 567 
Bessel functions, 67 
asymptotic expansions 
asymptotic forms, 692-695 
exercises, 695-698 
Hankel functions, asymptotic forms of, 
688-690 
of an integral representation, 690-692 
asymptotic values, 690 
of first kind 
Bessel’s differential equation, 646-647 


confluent hypergeometric representation, 919 


cylindrical resonant cavity, 650-653 

exercises, 654-661 

Fraunhofer diffraction, circular aperture, 
648-650 

Frobenius method, 643 

generating function for integral order, 
644-645 

integral representation, 647-648 

modified, 681 

orthogonality, 661 

recurrence relations, 645-646 


second kind, 644 
Wronskian, 670-671 
Hankel functions 
contour integral representation of, 676-678 
definitions, 674-675 
exercises, 678-680 
Helmholtz equation, 680, 698, 705 
hyperbolic, 683 
Laplace equation, 651 
modified, 428, 643, 678, 682f 
asymptotic expansion, 688 
contours, 696f 
exercises, 688 
Fourier transforms, 684 
Green’s function, 684-685 
Hankel function, 682 
hyperbolic Bessel functions, 683 
integral representation, 683-684, 690-692 
Laplace equations, 680 
recurrence relations, 681-682 
series expansion, 681 
Whittaker functions, 682 
Neumann functions, Bessel functions of second 
kind 
coaxial wave guides, 672 
definition and series form, 667-669 
exercises, 674 
integral representations, 669 
recurrence relations, 669-670 
uses of, 671 
Wronskian formulas, 670-671 
orthogonality 
Bessel series, 663 
electrostatic potential in a hollow cylinder, 
663-664 
exercises, 665-667 
Neumann boundary condition, 662 
normalization, 662 
Sturm-Liouville theory, 661 
PDEs, 643 
recurrence relations, 645-646 
of second kind, 667 
Schlaefli integral, 653-654 
spherical, 643 
asymptotic values, 703 
definitions, 702 
exercises, 709-712 
Helmholtz equation, 698 
limiting values, 703 
modified, 709 
orthogonality and zeros, 703 
particle in a sphere, 704-706 
recurrence relations, 702 
waves, 703 





of third kind, 675 
in wave guides, 671-672 
zeros, 648-653 
Bessel series, 663 
Bessel’s correction, 1168 
Bessel’s differential equation, 646-647 
Bessel’s equation, 344-345, 366-367, 1025-1027 
limitations of series approach, 351-353 
Bessel’s inequality, 262 
beta function, 617 
definite integrals, alternate forms, 618 
derivation of Legendre duplication formula, 
618-619 
exercises, 619-622 
binomial coefficients, 34 
binomial distribution, 1148-1151 
limits of, 1157-1158 
and Poisson distribution, 1153-1154, 1154f 
binomial expansion, application of, 41-42 
binomial probability distribution, 1149, 1150f 
binomial theorem, 33-36, 493-494, 581, 716 
exercise, 36—40 
Biot-Savart law, 750, 751, 751f 
black hole, optical path near event horizon of, 
1087-1088, 1088f 
Bohr radius, 897, 1135 
Born approximation, quantum mechanical 
scattering, 465-466 
Bose-Einstein statistics, 1132, 1133 
bosons, 840 
boundary conditions, 381, 405 
Cauchy, 412 
Dirichlet, 412, 985 
Green’s function, 452-454 
hollow cylinder, 664 
homogeneous, 448 
Neumann, 412 
ring of charge, 730 
specific, 438-439 
sphere in uniform electric field, 728 
sphere with, 428-430 
waveguide, coaxial cable, 671 
boundary curve, 406 
boundary value problem, 1052 
brachistochrone problem, 1082 
branch cut (cut line), 500, S08f 
exploiting, 534-537 
using, 534-535 
branch points, 499-503, 499f, 500f, 502f, 503f, 
503t, 536-537 
avoidance of, 532-534 
of order 2, 500 
Bravais lattice, 869 
Bromwich integral, 1038-1040, 1040f 
brute-force approach, 32 
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Cc 
calculus of residues 
Cauchy principal value, 512-515, 512f, 515f 
computing residues, 510-511 
counting poles and zeros, 518-519 
exercises, 520-522 
pole expansion of meromorphic functions, 
515-518 
product expansion of entire functions, 519-520 
residue theorem, 509-510, 509f 
calculus of variations 
Euler equation, 1081-1085 
alternate forms of, 1088 
exercises, 1093-1096 
optical path near event horizon of a black 
hole, 1087-1088, 1088f 
soap film, 1088-1090, 1089f 
soap film—minimum area, 1090-1093, 1092f 
straight line, 1086 
Lagrangian multipliers, 1107-1109 
Rayleigh-Ritz variational technique, 1117-1118 
ground state eigenfunction, 1118-1119 
several dependent variables, 1096-1097, 1102 
exercises, 1105-1107 
Hamilton’s principle, 1097-1098 
Laplace’s equation, 1101-1102 
moving particle—Cartesian coordinates, 
1098-1099 
moving particle—circular cylindrical 
coordinates, 1099 
several independent variables, 1100-1102 
variation with constraints, 1111-1112 
exercises, 1121-1124 
Lagrangian equations, 1112-1113 
Schrédinger wave equation, 1116-1117 
simple pendulum, 1113-1114, 1113f 
sliding off a log, 1114-1115, 1114f 
canonical momentum, 1099 
Cartesian coordinate system, 47 
Cartesian coordinates, 415—420 
spherical harmonics using, 758 
Casimir operators, 849 
Catalan’s constant, 13, 572, 613 
catenoid, catenary of revolution, 1090 
Cauchy (Maclaurin) integral test, 5-8 
Cauchy boundary conditions, 412 
Cauchy criterion, 2 
Cauchy inequality, 490 
Cauchy principal value, 512-515, 512f, 515f, 
Cauchy ratio test, 1065 
Cauchy root test, 4 
Cauchy’s integral formula, 486-487, 554, 591 
applications of, 490 
derivatives, 488 
exercises, 491-492 
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Cauchy’s integral formula (continued) 
Morera’s theorem, 489-490 
Cauchy’s integral theorem 
contour integrals, 477-478, 478f 
exercises, 485, 486f 
Goursat proof, 481-482, 481f 
Laurent expansion, 492-497 
multiply connected regions, 483-484, 483f 
statement of, 478-481, 480f 
Cauchy-Riemann conditions, 471-477 
analytic functions, 472-474 
derivatives of, 474-475 
exercises, 476-477 
overview, 471-472 
point at infinity, 475 
Cauchy-Riemann differential equations, 591 
causality, 591 
cavities, cylindrical, 650-653 
central field potential, Laplacian of, 154 
central force, 192 
central force problems, 426 
central moments, 1141 
chain rule, 63 
chaotic behaviour, 377 
character, 831, 832t 
characteristic curves, 405 
characteristic equation, 303 
characteristic function in probability theory, 1160 
characteristic polynomial, 303 
characteristics of PDEs, 404-406 
charge density, 739 
Chebyshev differential equation, 388 
Chebyshev inequality, 1140 
Chebyshev polynomials 
exercises, 907-911 
generating functions, 899 
hypergeometric representations, 914 
numerical analysis, 905—906 
orthogonality, 906-907 
recurrence relations, 901-903 
shifted, 908 
trigonometric form, 904—905 
type I, 900 
type II, 899 
ultraspherical polynomials, 899 
chi square fit, 1170, 1173-1174 
chi-square ( x2 ) distribution, 1170-1174 
Christoffel symbols, 222 
evaluating, 223-224 
circle of convergence, 493 
circular contour 
z” on, 479 
circular cylindrical coordinates, 187—190, 188f, 
421, 431 
cylindrical eigenvalue problem, 422-424 


circular disk, rotations of, 818 
circular functions, 58-59 
circular membrane, Bessel functions, 659 
circular optical path, 1088/ 
circular wave guide, 672 
circular wire loop, 931f 
classes, 830-835 
Clausen functions, 949 
Clebsch-Gordan coefficients, 789-791 
Clifford algebra, 112 
closed loop, 499-501, 499f, 500f 
closure relation, 264 
coaxial wave guides, 671-672 
coefficient vector, 261 
colatitude, 72 
collinear velocities, addition of, 864 
column vector, 95, 123, 125 
extraction of, 108 
combinations, counting of, 1130-1133 
commutation rules, 785—786 
commutative, 96, 816 
commutative operation, 47 
commutator, 97, 276 
comparison tests, 3—4 
completeness, 255, 262 
of Hilbert-Schmidt of integral equations, 1073 
complex conjugation, 54, 470 
complex exponentials, integrals with, 527-531, 
529f 
complex numbers 
and functions, 53 
Cartesian components, 53 
circular and hyperbolic functions, 58-59 
complex domain, 55—56 
exercises, 60-61 
imaginary numbers, 54 
logarithm, 60 
polar representation, 56-58 
powers and roots, 59 
multiplication of, 54 
of unit magnitude, 57 
complex plane, 56 
complex variable theory, 53 
complex variables, see also Cauchy-Riemann 
conditions; mapping; singularities 
algebra using, permanence of algebraic form, 55 
Cauchy’s integral formula, 591 
causality, 591 
dispersion relations 





exercises, 596-597 
optical dispersion, 594-595 
overview, 591-593 
Parseval relation, 595-596 
symmetry relations, 593 
functions of, 470 
computing residues, 510-511 
conditional convergence, 13 
conditional probability, 1128 
conditional probability distributions, 1147 
Condon-Shortley phase, 758, 760¢ 
confidence interval, 1176-1178 
confluent hypergeometric functions, 912 
asymptotic expansions, 919 
Bessel and modified Bessel functions, 918-919 
exercises, 920-922 
Hermite functions, 919 
Laguerre functions, 919 
Whittaker function, 919 
Wronskian, 922 
conformal mapping, 549 
conjugate subgroup, 820 
conjugation, complex, 56, 105 
connected, simply, 164 
conservation laws, 815 
conservative force, 171, 244 
constant B field, vector potentials of, 172 
constant coefficients, with ODEs, 342-343 
constrained minima/maxima, 1107—1109 
exercises, 1110 
contiguous function relations, 913 
continuous deformation, 484 
continuous groups, 816, 845-846 
exercises, 861 
homomorphism SU(2)—SO(3), 851-852 
Lie groups and their generators, 846-849 
of representation, 824-825 
SO(2) and SO(3), 849-851 
SU(3), 852 
continuous random variable, 1135-1137 
contour integral, 477-478, 478f 
contour integral representation, 676-678 
contour integration, 967, 967f 
singularity on, 530-531 
methods, 572, 603 
contraction, 209-210 
contravariant basis vectors, 220-221 
contravariant metric tensor, 219 
contravariant tensors, 206-207 
contravariant vectors, 206, 219 
convergence 
infinite products, 575 
infinite series, partial sum approximation, 579 
of Neumann series, 1066 
convergence in the mean, 262 


Index 1185 


convergence of infinite series 
absolute, 13, 23 
of power series, 29 
rate, 16 
tests, see also Cauchy (Maclaurin) integral test 
comparison, 3—4 
Gauss’, 9 
improvement of, 16-17 
Kummer’s, 8-10 
uniform and nonuniform, 21-22 
convergence, rate of, 16 
convolution (Faltungs) theorem 
driven oscillator with damping, 1035—1037 
Parseval relation, 987-990 
coordinate transformations 
exercises, 138-139 
of orthogonal, 135 
of reflections, 136-137, 137f 
of rotations, 133-135 
of successive operations, 137-138 
coordinates, see also Cartesian coordinates; 
circular cylindrical coordinates; orthogonal 
coordinates; spherical polar coordinates 
curvilinear, 182 
correlation, 1142-1144 
cosines 
asymptotic expansion, 581, 582 
confluent hypergeometric representation, 920 
infinite products, 575 
integral of in denominator, 523-524 
integrals cos in asymptotic series, 580-582 
Coulomb’s law, 447, 730 
counting poles and zeros, 518-519 
coupling, angular momentum, see angular 
momentum 
covariance, 1142-1144 
covariance of Maxwell’s equations, Lorentz, see 
Lorentz covariance of Maxwell’s equations 
covariant, 862 
covariant basis vectors, 218, 220-221 
covariant derivatives, 222—223 
covariant metric tensor, 219 
covariant tensors, 206-207 
covariant vector, 206 
Cramer’s rule, 84 
creation operator, 882 
criterion, Leibniz, 11—12 
cross derivatives, 62 
cross product, 126-128, 126f, 127f, see also triple 
vector product 
crossing conditions, 593 
crystallographic point groups, 869 
curl, Vx, 149-153 
circular cylindrical coordinates, 193 
in curvilinear coordinates, 186-187, 186f 
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curvilinear coordinates, 182 
differential operators in, 185—187, 185f, 186f 
exercises, 196-203 
integrals in, 184-185 
cut line (branch cut), 500, 508f 
exploiting, 534 
using, 534-535 
cylindrical symmetry, 443 
cylindrical traveling waves, 694-695 


D 
d’Alembert ratio test, 4-5, 55, 578 
d’Alembert’s solution, of wave equation, 436 
damped oscillator, 1021 
de Moivre’s Theorem, 59 
decuplet, 859, 859f 
defective matrices, 324 
definite integral (Euler), 600-601 
definite integrals, 580 
evaluation of, 522 
exercises, 538-544 
degeneracy, 307-308 
degenerate, 1057, 1072 
delta function, Dirac, 263—265, 1010-1011 
5-sequence function, 76f 
Dirichlet kernel, 77 
exercise, 80-81 
Fourier series, 77 
Kronecker delta, 79-80 
properties of, 78-79 
sequence, 76 
spherical polar coordinates, 79 
denominator, integral of cos in, 523-524 
dependent variable, 329 
derivative operators, tensor, see tensor derivative 
operators 
derivatives, see also exterior derivatives 
chain rule, 63 
cross derivatives, 62 
exercises, 64 
mixed derivatives, 401 
partial derivatives, 62, 401 
stationary points, 63-64 
determinants, 295 
and linear dependence, 89-90 
derivatives of, 102 
exercises, 93-94 
homogeneous linear equations, 83-84 
inhomogeneous linear equations, 84 
product theorem, 103-104 
properties of, 87 
deuteron, 391-393 
diagonal matrices, 99 
eigenvalues, 313 
eigenvector, 312 


diagonalization 
matrices, 311-314 
simultaneous, 314-315 
differentiable manifolds, 233 
differential equations 
first-order differential equations, 331-342 
exact differential equations, 333 
exercises, 339-342 
homogeneous equations, 334-335 
isobaric equations, 335 
linear first-order ODEs, 336-339 
nonseparable ODEs, 333-334 
parachutist, 331-332 
RL circuit, 338-339 
separable equations, 331 
Fuchs’ theorem, 355 
linear independence of solutions, 358-360 
second solution, 362-363 
series form of the second solution, 364-366 
nonlinear, 377-380 
number of solutions, 361 
partial differential equations, 329 
particular solution, 337 
second solution 
exercises, 370-374 
finding, 362-363 
for linear oscillator equation, 363 
logarithmic term, 668 
Neumann functions, 368-369 
of Bessel’s equation, 366-367 
series solutions, Frobenius method, 346-350, 
350f 
exercises, 355-358 
expansion about, 350 
Fuchs’ theorem, 355 
limitations of series approach, Bessel’s 
equation, 351-353 
regular and irregular singularities, 353-354 
symmetry of solutions, 350-351 
singular points, 343-345, 345¢ 
differential forms, 232 
0-forms, 233 
1-forms, 233 
2-forms, 233 
3-forms, 233 
complementary, 235-236 
exercises, 238, 243, 248 
exterior algebra, 233-235 
exterior derivatives, 238-243 
Hodge operator, 235 
in Minkowski space, 236-237 
integration of, 243-248 
Maxwell’s equations, 241-243 
miscellaneous, 237-238 
simplifying, 234 





Stokes’ theorem on, 245 
three-dimensional (3-D), 407 
differential operators, 275 
differential vector operators, 143 
gradient, 143 
properties, 153-157 
exercises, 157-159 
differentiate parameter, 67 
differentiation 
of forms, 238-243 
power series, 30 
diffraction, 648-650 
diffusion partial differential equations, 437-444 
digamma and polygamma functions, 610 
digamma functions, 610-611 
exercises, 614-616 
Maclaurin expansion, computation, 613 
polygamma function, 612 
series summation, 613 
dihedral, 818 
dilogarithm 
exercises, 926 
expansion and analytic properties, 923-924 
properties and special values, 924-926 
dimensionality theorem, 831 
dipole moment, 738 
Dirac braket notation, 265 
Dirac delta distribution, 972 
Dirac delta function, see delta function, Dirac 
Dirac gamma matrices, 112 
Dirac half-braket notation, 265 
Dirac matrices, 111 
Dirac notation, 265-266 
Dirac’s relativistic theory, 38 
direct product, 108-112, 837-839 
exercises, 837/, 840, 840r 
generators for, 857-858 
of tensors, 210-211 
direct space, 964 
Dirichlet boundary conditions, 385, 412, 704 
Dirichlet conditions, 936, 985 
Dirichlet kernel, 77 
Dirichlet series 
exercises, 573-574 
overview, 571-572 
discontinuous functions, 937-939 
expansions in, 262-263 
discrete Fourier transform, 1002-1007 
aliasing, 1006 
exercises, 1007 
fast Fourier transform, 1006-1007 
limitations, 1005 
orthogonality over discrete points, 1002-1004 
discrete groups, 815 
classes, 830-835 
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exercises, 835-837, 836t, 837f 

other, 835 
discrete probability distributions, computing, 1136 
discrete random variables, 1134-1135 
discrete spectrum, 420 
dispersion integral contour for, 592f 
dispersion relations 

causality, 591 

crossing conditions, 593 

exercises, 596-597 

Hilbert transforms, 593, 595 

optical dispersion, 594-595 

overview, 591-593 

Parseval relation, 595-596 

sum rules, 596 

symmetry relations, 593-594 
divergence, V, 146-149, 149f 

curvilinear coordinates, 185—186, 185f 
divergent series, 4 
division, of random variables, 1162 
Doppler shift, 37 
dot products, 49-50 

gradient of, 143, 157 
double factorial notation, 35 
double series, rearrangement of, 18-19 
driven oscillator with damping, 1035-1037 
dual tensors, 216-217 


E 

Earth’s gravitational field, 727 
Earth’s nutation, 1018-1019, 1018f 
eigenfunction, 299 

eigenfunction completeness of Hilbert-Schmidt 

of integral equations, 1073 

orthogonal, 1069-1073 

eigenfunction expansion of Green’s function, 
460-461 

eigenvalue problem, 422-424 
eigenvalues 

equations, 299-300 

basic expansions, 300 
equivalence of operator and matrix form, 300 

of Hermitian matrices, 310 

of Hilbert-Schmidt theory, 1073 
eigenvectors, 300 

normalizing, 304 

of Hermitian matrices, 310-311 
Einstein convention, 207 
electric dipole, 738, 737f 
electric multipoles, 737-739 
electric quadrupole, 738 
electromagnetic field tensor, 866 
electromagnetic wave equation, 156 
electromagnetic waves, 1023-1024 
electromagnetism, potentials in, 174 
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electron spin, 846 
electrostatic potential 
for ring of charge, 729-730 
in hollow cylinder, 663-664 
elementary functions, 1008-1010 
elliptic integrals 
definitions of, 928-929 
exercises, 931-932 
of first kind, 928 
limiting values, 930 
period of simple pendulum, 927-928 
of second kind, 928 
series expansion, 929-930 
elliptic partial differential equations (PDEs), 410 
empty set, 1127 
energy, relativistic, 35-36 
entire function, 519-520 
equality of matrices, 96 
equation of continuity, 148 
equations, see also Maxwell’s equations 
motion and field, 213 
equilateral triangle, symmetry of, 817, 817, 818f 
error function, 637 
error propagation, 1165-1168 
essential (irregular) singular point, 344 
essential singularities, 344, 498 
Euclidean space, 237 
Euler angles, 140, 140f 
Euler equation, 1081-1085 
alternate forms of, 1088 
exercises, 1093-1096 
soap film, 1088-1090, 1089f 
soap film—minimum area, 1090-1093, 1092f 
straight line, 1086 
Euler identity, 113 
Euler transformation, 43, 44 
Euler-Maclaurin integration formula, 566 
Bernoulli polynomials, 567 
example, 569-570 
exercises, 570-571 
overview, 567-569 
Euler-Mascheroni constant, 7, 33, 367, 675 
event horizon, 1087 
evolution operator, 1067 
exact ODEs, 333-334 
expansion, 736, see also Taylor’s expansion 
Laplace expansion, 760-762, 799-801 
pole, of meromorphic functions, 498, 515—518 
product, of entire function, 519-520 
spherical harmonic, 761 
spherical wave, 798-799 
expectation value, 283, 285, 297, 1136 
in transformation basis, 295 
exponential function, of Maclaurin theorem, 
27-28 


exponential integral, 578-580, 634-637 
exterior algebra, 233 

exterior derivatives, 238-243 

exterior products, 233 

extrema, 62-64 


F 


factorial function, asymptotic form of, 588 
factorial notation, 606 
faithful group, 822 
Faraday’s law, 168 
fast Fourier transform (FFT), 1006-1007 
Feldheim’s formula, 885 
Fermi-Dirac statistics, 1132, 1133 
fermions, 840 
FFT, see fast Fourier transform 
field equations, 213 
finite wave train, 971-973, 972f 
first-order Born approximation, 1067 
first-order differential equations, 331-342 
exact differential equations, 333 
exercises, 339-342 
homogeneous equations, 334-335 
isobaric equations, 335 
linear first-order ODEs, 336-339 
nonseparable ODEs, 333-334 
parachutist, 331-332 
RL circuit, 338-339 
separable equations, 331 
first-order partial differential equations, 403 
characteristics of, 404—406 
exercises, 408-409 
general, 406-407 
fixed and movable singularities, special solutions, 
378-379 
flux, 148 
Fourier convolution theorem, 1055 
exercises, 994-997 
multiple convolutions, 990-992 
Fourier cosine series, 941 
Fourier cosine, sine transforms, 966 
Fourier expansions, characteristic of, 951 
Fourier integral representation, 969-970 
Fourier series, 77 
applications of, 949-957 
exercises, 952-957 
full-wave rectifier, 950-951, 951f, 952t 
square wave, high frequencies, 949-950, 
949f 
definition of, 935 
general properties, 935-949 
discontinuous functions, 937-939 
exercises, 945-949 
periodic functions, 939-940 
sawtooth wave, 937-939, 938f 





Sturm-Liouville theory, 936-937 
summation of a Fourier series, 944 
symmetry, 940-941, 942f 
Gibbs phenomenon 
calculation of overshoot, 959-961 
exercises, 961-962 
square wave, 958-959 
summation of series, 957-958 
operations on, 942-944 
Fourier sine series, 941 
Fourier transform, 965-968 
aliasing, 1006 
convolution theorem, 985—987 
of derivatives 
heat flow PDE, 983 
wave equation, 981-982 
discrete, see discrete Fourier transforms 
exercises, 975-980, 985 
fast, 1006-1007 
of Gaussian, 969, 968f 
inverse, 970-973 
limitations on transfer functions, 1000-1001 
momentum space representation, 993-994 
of product, 992 
properties of, 980-984 
solution, 1054-1055 
successes and limitations of, 984-985 
in 3-D space, 973-975 
unitary operator, 988 
Fourier transforms—inversion theorem, finite wave 
train, 971-973, 972f 
Fourier-Mellin integral, 1039 
Fraunhofer diffraction, Bessel function, 648-650 
Fredholm equation, 1047, 1052, 1054, 1058, 1064 
homogeneous, 1059-1060, 1069 
inhomogeneous, 1076-1077 
Fresnel integrals, 712f 
Frobenius method, 643, 645 
series solutions, 346-350, 350f 
Fuchs’ theorem, 355, 692 
full-wave rectifier, 950-951, 951f, 952t 
functions, 143 
Chebyshev polynomials 
exercises, 907-911 
generating functions, 899 
numerical analysis, 905—906 
orthogonality, 906-907 
recurrence relations, 901-903 
trigonometric form, 904—905 
type I, 900 
type II, 899 
ultraspherical polynomials, 899 
confluent hypergeometric functions 
Bessel and modified Bessel functions, 
918-919 
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exercises, 920-922 
Hermite functions, 919 
Laguerre functions, 919 
Whittaker function, 919 
dilogarithm 
exercises, 926 
expansion and analytic properties, 923-924 
properties and special values, 924-926 
Dirac delta, 263-265 
discontinuous, 262-263 
entire, 498 
exponential, of Maclaurin theorem, 27—28 
Hermite functions 
applications of the product formulas, 
885-887 
direct expansion of products of Hermite 
polynomials, 884-887 
exercises, 876-878, 887-888 
Hermite product formula, 884-887 
molecular vibration, 882-883 
orthogonality and normalization, 875-876 
quantum mechanical simple harmonic 
oscillator, 878-879 
recurrence relations, 872—873 
Rodrigues formula, 874 
threefold Hermite formula, 883-884 
values of, 873-874 
hypergeometric functions, 911 
confluent, 912 
contiguous function relations, 913 
exercises, 915-916 
hypergeometric representations, 913-914 
Pochhammer symbol, 912 
Laguerre functions 
associated Laguerre polynomials, 892-895 
differential equation—Laguerre polynomials, 
890-892 
exercises, 897-899 
hydrogen atom, 896-897 
Rodrigues formula and generating function, 
889-890 
of complex variables, 470 
of operators, 282 
orthonormal, 269-271 
series expansions, 41-44 
excercise, 44-45 
series of 
Abel’s test, 23 
exercises, 24—25, 32-33 
uniform and nonuniform convergence, 21—22 
Weierstrass M test, 22—23 
square-wave, 263 


G 
Galilean, 862 
gamma distribution, 607, 1162-1164 
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gamma function, see also factorial function 
analytic properties, 604 
asymptotic form of, 588-589 
beta function 
definite integrals, alternate forms, 618 
derivation of Legendre duplication formula, 
618-619 
exercises, 619-622 
definitions, simple properties, 599 
definite integral (Euler), 600-601 
factorial notation, 606 
infinite limit (Euler), 599-600 
infinite product (Weierstrass), 602 
incomplete beta function, 634 
incomplete gamma functions and related 
functions, 633-634 
error function, 637 
exercises, 638-641 
exponential integral, 634-637 
Riemann zeta function, 626-631 
Stirling’s series, 622 
derivation from Euler-Maclaurin integration 
formula, 623-624 
gamma function contour, 605f 
gamma functional relation, 506 
gauge condition, 174 
gauge transformations, 174 
Gauss elimination, 91—93 
Gauss technique, 91 
Gauss’ fundamental theorem of algebra, 490 
Gauss’ law, 175-176, 175f 
Gauss’ normal distribution, 1155-1159 
Gauss’ test, 9 
Legendre series, 9-10 
Gauss’ theorem, 164-165, 165f, 176, 248 
Green’s theorem, 165-166 
Gegenbauer polynomials, see ultraspherical 
polynomials 
Gell-Mann matrices, 854 
general coordinates, tensor in 
covariant derivatives, 222—223 
exercises, 226 
metric tensor in, 218-219 
general relativity, 862 
generalized Abel equation, 1055-1056 
generating function, 555, 1056 
associated Laguerre polynomials, 892-895 
Bernoulli numbers, 560 
Bessel functions, modified, 919 
Chebyshev polynomials, 899 
electric multipoles, 737-739 
exercises, 740-741 
expansion, 736-737 
Hermite polynomials, 555-556, 872 
for integral order, 644-645 


Laguerre polynomials, 889-890 
Legendre polynomials, 557-558 
physical interpretation of, 735 
Taylor expansion of, 565 
generators of continuous groups, 846-849 
geodesics, 1103-1104 
geometric properties, 47 
geometric series, 2—3 
Gibbs phenomenon 
calculation of overshoot, 959-961 
exercises, 961-962 
square wave, 958-959 
summation of series, 957-958 
Goldschmidt discontinuous solution, 1091, 1092f 
Goursat proof of Cauchy’s integral, 481-482, 
481f 
gradient, V 
as differential vector operator, 143-146 
in curvilinear coordinates, 185 
of dot product, 157 
Gram-Schmidt orthogonalization 
example, 270-272 
exercises, 273-275 
orthonormalizing physical vectors, 273 
overview, 269-270 
physical vectors, 272 
vectors by, 269-275 
Gram-Schmidt process, 308 
Gram-Schmidt transformation, 293 
graphene, 869 
Grassmann algebra, see exterior algebra 
gravitational potential, 172 
Green’s function, 447-467, 684-685, 983-984, 
1050, 1052, 1072, 1075 
advantage of, 452 
axial, 464 
boundary conditions, 452-454 
accomodating, 464 
at infinity, 454 
initial value problem, 453-454 
differential vs. integral formulation, 456 
eigenfunction expansion of, 460-461 
exercises, 456-459, 466-467 
features of, 448, 459-460 
form of, 450-452, 461-466 
fundamental, 462, 463t 
general properties of, 449-450 
Helmholtz equation, 463 
Laplace’s equation, 462, 464 
one-dimensional, 448-459 
relation to integral equation, 454-456 
self-adjoint problems, 460 
spherical, 463, 800-801 
two and three dimension problems, 459-467 
Green’s theorem, 165-166, 246-247 





Gregory series, 39 
ground state, 391 
ground state eigenfunction, 1118-1119 
group theory, see also generators of continuous 
groups; homogeneous Lorentz group 
definition of, 816-817, 817, 818/, 818¢ 
discrete 
classes, 830-835 
other, 835 
exercises, 820, 821f 
faithfulness, 822 
homomorphic, 817 
homomorphism and isomorphism, 819 
isomorphic, 817 
Lorentz covariance of Maxwell’s equations, 
866-868 
vierergruppe, 820 


H 
Hamilton’s equations, 1099-1100 
Hamilton’s principle, 1097, 1098 
Hankel functions, 682 
asymptotic forms, 692, 693 
contour integral representation of, 676-678 
definition, 674-675 
integral representation of, 698f 
series expansion, 675 
spherical, 701 
Wronskian formulas, 675 
Hankel transforms, 965, 1054 
harmonic functions, 473 
harmonic numbers, 3 
harmonic oscillator, 878-879, 1017 
harmonic series, 3 
harmonics, 799, see also spherical harmonics; 
vector spherical harmonics 
Hartree atomic units, 396 
heat flow partial differential equations, 437-444, 
983 
Heaviside shifting theorem, 1023 
Heaviside step function, 1010 
Heisenberg uncertainty principle, 973 
Helmholtz equation, 415, 422, 439, 705 
Bessel functions, 680, 698 
Green’s function, 463 
spherical coordinates, 698 
Helmholtz’s theorem, 177—180 
Hermite equation, 390-391 
Hermite functions 
applications of the product formulas, 885-887 
confluent hypergeometric functions, 919 
direct expansion of products of Hermite 
polynomials, 884-887 
exercises, 876-878, 887-888 
Hermite polynomials, 872 
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Hermite product formula, 884-887 
molecular vibration, 882-883 
orthogonality and normalization, 875-876 
quantum mechanical simple harmonic 
oscillator, 878-879 
recurrence relations, 872-873 
Rodrigues formula, 874 
threefold Hermite formula, 883-884 
values of, 873-874 
Hermite polynomial, see also Legendre 
polynomials 
Hermite polynomials, 280, 391, 554, 873f 
direct expansion of products of, 884-887 
example, 554-556 
generating function, 555-556, 872 
orthogonality integral, 875 
recurrence relations, 872-873 
Rodrigues representation, 874 
Hermitian matrices, 108, 301 
anti-, 319 
diagonalization, 311-313 
example, 313 
exercises, 317-318 
expectation values, 316 
finding diagonalizing transformation, 
313-314 
positive definite and singular operators, 317 
simultaneous, 314-315 
spectral decomposition, 315-316 
of eigenvalues, 310 
unitary transformation, 313 
Hermitian operator, 277, 284 
expectation value, 316 
self-adjoint ODEs, 384 
Hilbert space, 255-256, 278, 279, 289 
Hilbert transforms, 593, 595 
Hilbert-Schmidt theory 
homogeneous Fredholm equation, 1069 
inhomogeneous Fredholm equation, 1076-1077 
inhomogeneous integral equation, 1073-1077 
orthogonal eigenfunctions, 1069-1073 
symmetrization of kernels, 1069 
Hodge operator, 235 
homogeneous boundary condition, 448 
homogeneous equations, 334—335 
homogeneous Fredholm equation, 1059-1060, 
1069 
homogeneous linear equations, 83-84 
ODEs, 330 
homogeneous Lorentz group, 862-864 
homogeneous ODEs, 335, 338, 351 
second-order, 344 
homomorphic group, 817 
homomorphism, 819 
SU(2) and SU(2)-SO(3), 851-852 
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Hooke’s law spring, 342-343 
Hubble’s law, 52 
hydrogen atom, 896-897 
Schrédinger’s wave equation, 896 
hyperbolic functions, 58-59 
hyperbolic partial differential equations (PDEs), 
410 
hypercharge, 854 
hypergeometric equation 
alternate forms, 918 
singularities, 345, 912, 917 
hypergeometric functions, 911 
confluent, 912 
contiguous function relations, 913 
exercises, 915-916 
hypergeometric representations, 913-914 
Pochhammer symbol, 912 
hypergeometric series, see hypergeometric 
function 


identity operator, 277 
imaginary axis, 56 
imaginary numbers, 54 
imaginary part, 56 
improper rotations, of coordinate system, 215 
impulse function, 1011 
impulsive force, 1020 
incomplete beta function, 634 
incomplete gamma functions, 633-634 
of first kind confluent hypergeometric 
representation, 917 
indefinite integral of f(z), 489 
independence, linear, 671 
independent variables, 329, 407-408, 411 
indeterminate forms, 31 
indicial equation, 348 
indistinguishable particles, 1133 
inertial frames, 815 
infinite limit (Euler), 599-600 
infinite product (Weierstrass), 602 
infinite products 
convergence, 575 
evaluate, 575 
exercises, 576-577 
overview, 574-575 
sine and cosines, 575 
infinite series, 1, see also Taylor’s expansion; 
power series 
algebra of 
alternating series, 11—13 
convergence, 13 
convergence: absolute, 13 
convergence: Cauchy integral, 5-8 
convergence: Cauchy root, 4 


convergence: comparison, 3—4 
convergence: conditional, Leibniz criterion, 
15-16 
convergence: d’Alembert ratio, 4-5 
convergence: Gauss’, 9 
convergence: Kummer’s, 8-10 
convergence: Maclaurin integral, 5—8 
convergence: test of, 3-11 
convergence: uniform, 21—22, 29 
divergence of squares, 15—16 
double series, 18-19 
exercises, 20-21 
rearrangement of double, 18-19 
exercises, 10-11, 13-14 
fundamental concepts 
geometric series, 2—3 
harmonic, 3 
of functions 
Abel’s test, 23 
exercises, 24-25 
uniform and nonuniform convergence, 21—22 
Weierstrass M test, 22—23 
power series, 29-30 
infinity, boundary conditions at, 454 
inhomogeneous Fredholm equation, 1076-1077 
inhomogeneous integral equation, 1073-1077 
inhomogeneous linear equations, 84 
inhomogeneous linear ODEs, 375-377 
exercises, 377 
inhomogeneous Lorentz group, 862 
inner product and matrix multiplication, 97-98 
integer powers, 59 
integers, sum of, 40-41, 41 
integral equations 
boundary condition, 1048 
exercises, 1060-1064 
feature of, 1048 
Fredholm equation, 1047, 1052, 1054, 
1058-1060, 1069 
generating-function, 1056-1057 
Green’s function, 454-456 
Hilbert-Schmidt theory 
exercises, 1077-1079 
homogeneous Fredholm equation, 
1059-1060 
orthogonal eigenfunctions, 1069-1073 
symmetrization of kernels, 1069 
integral-transforms 
Fourier transform solution, 1054-1055 
generalized Abel equation, 1055-1056 
introduction, 1047-1048 
definition, 1047 
exercises, 1053 
linear oscillator equation, 1050-1052 





momentum representation in quantum 
mechanics, 1048-1049 
transformation of differential equation into 
integral equation, 1049-1050 
linear, 1047 
Neumann series 
exercises, 1068 
overview, 1064-1066 
solution, 1066-1067 
separable kernel, 1057-1059 
Volterra equation, 1047, 1048, 1050, 1055 
integral form, Neumann functions, 671 
integral operator, 275, 1066 
linear, 1070 
integral representations, 647-648, 964 
of dilogarithm, 924f 
expansion of, 690-692 
of Hankel functions, 698f 
modified Bessel functions, 684 
integral test, Cauchy, see Cauchy (Maclaurin) 
integral test 
integral theorems 
exercises, 169-170 
Gauss’ theorem, 164-165, 165f 
Green’s theorem, 165-166 
Stokes’ theorem, 167—168, 167f, 168f 
integral transforms, 1054 
convolution (Faltungs) theorem, driven 
oscillator with damping, 1035-1037 
convolution theorem, 985-987 
Parseval relation, 987-990 
Fourier transform of derivatives 
heat flow PDE, 983 
wave equation, 981-982 
Fourier transform of Gaussian, 968-969, 968f 
Fourier transform solution, 1054-1055 
generalized Abel equation, 1055-1056 
inverse Laplace transform 
Bromwich integral, 1038-1040, 1040f 
exercises, 1042-1045 
inversion via calculus of residues, 1040 
multiregion inversion, 1041-1042, 1041f, 
1042f 
Laplace transform of derivatives, 1016-1020 
Earth’s nutation, 1018-1019, 1018f 
impulsive force, 1019-1020 
simple harmonic oscillator, 1017 
use of derivative formula, 1017 
Laplace transforms, 1008-1034 
definition, 1008 
Dirac delta function, 1010-1011 
elementary functions, 1008-1010 
exercises, 1014-1015 
Heaviside step function, 1010 
inverse transform, 1012t, 1011-1014, 1014f 


Index 1193 


partial fraction expansion, 1013 
properties of, 1016-1034 
step function, 1013-1014, 1014f 
Laplace, Mellin, and Hankel transforms, 
965-966 
use of, 964f 
integrals, 67, 764, 927, see also Cauchy 
(Maclaurin) integral test; definite integrals; 
elliptic integrals 
containing logarithm, 532-534, 533f 
contour, 581, 592, 581f 
cosine, 580-582 
definite, 580 
evaluation of, 65 
1-D integral, 66 
differentiate parameter, 67 
exercises, 74-75 
integration by parts, 65 
integration variables, 72—74 
multiple integrals, 70-72 
recursion, 69 
trigonometric integral, 69 
exponential, 578-580 
of meromorphic function, 526-527, 527f 
of three spherical harmonics, 803-805 
oscillatory, 529-530 
range, 525-527, 525f 
sine, 580-582 
trigonometric, 522-524 
with complex exponentials, 527-531, 529f 
integrating factors, 334 
integration 
by parts, 65, 568, 578 
by parts of volume integrals, 163 
contour of, 530-531 
of power series, 30, 583 
order, reversing, 70-71 
technique, 531-532 
variables, 72—74 
intersections, 1127-1130, 1128f 
invariants 
example, 295 
exercises, 296 
overview, 294-295 
inverse Fourier transform, 1055 
inverse Laplace transform 
Bromwich integral, 1038-1040, 1040f 
exercises, 1042-1045 
inversion via calculus of residues, 1040 
multiregion inversion, 1041-1042, 1041f, 
1042f 
inverse matrix, 99-102 
inverse transform, 211, 1011-1014, 1012¢ 
inversion 
multiregion, 1041-1042, 1041f, 1042f 
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inversion (continued) 

of power series, 32 

via calculus of residues, 1040 
inversion operation, 136 
irreducible representations, 822 
irreducible spherical tensors, 796 
irregular (essential) singular point, 344 
irregular sign changes, series with, 12-13 
irregular singularities, 353-354 
irregular solution, 369 
irrotational, 152, 154-155 
isobaric ODEs, 335 
isomorphic group, 817 
isomorphism, 819 
isospin, SU(2), 852-861 
isotropic tensors, 209 


J 

Jacobi method, 314 
Jacobi-Anger expansion, 655 
Jacobian, 73 

2-D and 3-D, 229-230 

definiton, 227 

direct approaches to, 231 

exercises, 231-232 

inverse of, 230-231 
Jacobian determinant, 229 
Jacobian matrix, 229 
Jensen’s theorem, 585 
Jordan’s lemma, 528 


Kk 

Kepler’s laws of planetary motion, 189-190 
kernel, 963 

of integral equation, 455 
kernel equation, 1047, 1052f 

separable, 1057-1059 
kernel function, 447 
Kirchoff diffraction theory, 166 
Kirchoff’s law, 338 
Korteweg-deVries equation, 413 
Kronecker delta, 79-80, 209, 258, 805 
Kronig-Kramers optical dispersion relations, 591, 

594 

Kummer’s test, 8-10 


L 

L’H6pital’s rule, 31, 517, 516, 576, 662 
ladder operators, 776-779 

construction, 788-795 
Lagrangian equations, 1112-1113 

of motion, 1098 
Lagrangian mechanics, 63 
Lagrangian multipliers, 1107-1109 
Laguerre functions 


associated Laguerre polynomials, 892-895 
differential equation—Laguerre polynomials, 
890-892 
exercises, 897-899 
hydrogen atom, 896-897 
Rodrigues formula and generating function, 
889-890 
Laguerre polynomials 
associated 
confluent hypergeometric representation, 919 
generating function, 892-895 
integral representation, 895 
orthogonality, 895 
recurrence relations, 893 
Rodrigues’ representation, 895 
Schrédinger’s wave equation, 896 
confluent hypergeometric representation, 919 
differential equation, 890-892 
generating function, 889-890 
recurrence relations, 890, 893 
Rodrigues’ formula, 889-890 
self-adjoint form, 894 
Laplace convolution theorem, 1034-1038 
exercises, 1037—1038 
Laplace equation, 154, 433-434, 726, 1101-1102 
Bessel functions, 651 
for parallelepiped, 417-419 
Green’s function, 462, 464 
Laplace expansion, 760-761 
Laplace series 
expansion theorem, 762 
gravity fields, 762 
Laplace spherical harmonic expansion, 799-801 
Laplace transforms, 965, 1008-1034, 1054 
convolution theorem, 1056 
of derivatives, 1016-1020 
Earth’s nutation, 1018-1019, 1018f 
impulsive force, 1019-1020 
simple harmonic oscillator, 1017 
use of derivative formula, 1017 
definition, 1008 
Dirac delta function, 1010-1011 
elementary functions, 1008-1010 
exercises, 1014-1015 
Heaviside step function, 1010 
inverse transform, 1011-1014, 10127, 1014f 
one-sided, 1008 
operations, 1028¢ 
other properties 
Bessel’s equation, 1025-1027 
change of scale, 1020 
damped oscillator, 1021 
derivative of a transform, 1024-1025 
electromagnetic waves, 1023-1024 
exercises, 1028-1034 





integration of transforms, 1027 
RLC analog, 1022, 1022f 
substitution, 1020 
translation, 1022-1023 
partial fraction expansion, 1013 
properties of, 1016-1034 
step function, 1013-1014, 1014f 
two-sided, 1008 
Laplacian, 154 
development by minors, 88 
in circular cylindrical coordinates, 192 
of vector, 155-156 
Laurent expansion 
exercises, 496-497 
Laurent series, 494-496 
Taylor expansion, 492-494, 493f 
Laurent series, 33, 494-496, 644 
least squares, method of, 1138 
Legendre duplication formula, derivation of, 
618-619 
Legendre equation, 425, 716 
Legendre functions, 425, 715, 768f 
associated, 744-745, 745t 
hypergeometric representation, 914 
recurrence formulas for, 745—746, 764 
associated Legendre equation, 741—743 
exercises, 753-756 
magnetic field of current loop, 748-753 
orthogonality, 748 
parity and special values, 746 
generating function 
electric multipoles, 737-739 
exercises, 740-741 
expansion, 736-737 
physical interpretation of, 735 
Legendre polynomials, 716 
associated, 743-744 
exercises, 722—724 
recurrence formulas, 718-720 
Rodrigues formulas, 720-721 
upper and lower bounds for P, (cos @), 720 
of second kind, 766 
alternate formulations, 769-770 
exercises, 770-771 
properties, 769 
orthogonality, 724 
Earth’s gravitational field, 727 
electrostatic potential for ring of charge, 
729-730 
exercises, 730-735 
Legendre series, 726-730 
sphere in uniform field, 727—729, 728f 
spherical harmonics, 756 
Cartesian representations, 758 
exercises, 765-766 
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Laplace expansion, 760-762 
properties of, 764-765 
solutions, 758-760 
symmetry of solutions, 762—763 
Legendre ordinary differential equations (ODEs), 
716 
Legendre polynomials, 270-271, 425, 557-558 
719t 
associated, 743-744 
exercises, 722-724 
generating function, 557-558, 716, 735 
orthogonality of, 726 
recurrence formulas, 718-720 
Rodrigues formulas, 720-721 
Schlaefli integral, 557 
upper and lower bounds for P, (cos 6), 720 
Legendre series, 9-10, 726-730 
Legendre’s differential equation, 276, 388 
Legendre’s duplication formula, 604 
Legendre’s equation, 389-390 
Leibniz criterion, 11—12 
Leibniz’s formula, 553, 742 
Lerch’s theorem, 1011 
level lines, 586 
Levi-Civita symbol, 85, 87, 216, 841, 850 
line integrals, 159-160, 160f 
linear electric quadrupole, 738f 
linear equation, 88-89 
linear equation system, 102—103 
linear first-order ODEs, 336-339 
linear Hermitian operator, 311 
linear independence of solutions, 358-360 
linear integral equations, 1047 
linear integral operator, 1070 
linear operation, 329 
linear operators, 275, 329, 401 
linear oscillator, 347-350 
linear oscillator equation, 347, 363, 1050-1052 
linear parameters, variation of, 1121 
linear vector space, 252 
linearly dependent equations, 90-91 
Liouville’s theorem, 490 
Lippmann-Schwinger equation, 466 
logarithm, 60 
Lommel integrals, 665 
Lorentz covariance of Maxwell’s equations, 
866-868 
exercises, 868-869 
Lorentz gauge, 174 
Lorentz group, see homogeneous Lorentz group 
exercises, 865-866 
Lorentz transformation, 862 
of E and B, 867-868 
lowering operator, 777 
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M 


Maclaurin expansion, 64 
computation, 613 
Maclaurin integral test, 5—8 
Riemann Zeta function, 7 
Maclaurin series, 27, 44, 253 
Maclaurin theorem, 27 
exponential function, 27-28 
logarithm, 28-29 
magnetic dipole, 748-753 
magnetic field of current loop, 748—753 
magnetic moment, 753 
magnetic vector potential, 173-174, 193 
manifestly covariant form, 868 
mapping, 57 
complex variables, 547-549, 548f 
exercises, 549-550 
conformal, 549 
matching conditions, 391 
mathematical induction, 40-41 
excercise, 41 
matrices, 95 
addition and subtraction, 96 
adjoint matrix, 105 
defective, 324 
definitions, 95-96 
diagonalization, 311-314 
Dirac notation in, 266 
direct product, 108-112 
equality, 96 
functions of, 113-114 
Hermitian matrices, 108, 315 
multiplication, 97, 279 
inner product, 97-98 
by scalar, 96 
normal 
exercises, 324-326 
normal modes of vibration, 322-324 
overview, 319-320 
null matrix, 96 
numerical inversion of, 100 
orthogonal matrices, 107 
product theorem, 103-104 
rank of, 104 
symmetric, 105 
trace matrix, 105 
transpose matrix, 104 
unitary matrices, 107, 314 
matrix algebra, 95 
matrix eigenvalue equation, 300 
matrix eigenvalue problems, 301 
example, 301-303 
2-D ellipsoidal basin, 303-305 
block-diagonal matrix, 305-307 
exercises, 308-310 


matrix elements, 279 
of operator, 280-281 
matrix invariant, 295 
matrix products, operations on, 106 
Maxwell’s equations, 155, 241-243, 594 
Gauss’ law, 176 
Lorentz covariance of, 866-868 
Maxwell-Boltzmann distribution, 606-607 
Maxwell-Boltzmann statistics, 1132, 1133 
mean value, 1136-1140 
mean value theorem, 26, 62 
measurement errors, 1125, 1170 
Mellin transforms, 966, 1054 
meromorphic, 498, 515 
meromorphic functions 
integral of, 526-527, 527f 
pole expansion of, 515-518 
metric spaces, 218 
metric tensor, 218-219 
Christoffel symbols as derivatives of, 223 
metric, curvilinear coordinates, 184 
Milne’s model, 37 
Minkowski space, 236-237, 864 
miscellaneous vector identities, 156—157 
Mittag-Leffler theorem, 515-516 
mixed derivatives, 401 
mixed tensor, 209, 210 
modified Bessel functions, 678, 680, 682f 
asymptotic expansion, 688 
contours, 696f 
exercises, 688 
Fourier transforms, 684 
Green’s function, 684-685 
Hankel function, 682 
hyperbolic Bessel functions, 683 
integral representation, 684, 690-692 
Laplace equations, 680 
recurrence relations, 681-682 
series expansion, 681 
Whittaker functions, 682 
modified spherical Bessel functions, 428 
modulus, 56, 470 
molecular vibration, 882-883 
moment-generating function, 1141-1142, 
1149-1150 
momentum, see angular momentum 
momentum representation 
in quantum mechanics, 1048-1049 
Schrédinger wave equation, 994 
monopole moment of charge distribution, 739 
monotonic decreasing function, 5 
movable singularities, 378-379 
moving particle, Cartesian coordinates, 
1098-1099 
multinomial coefficient, 1132 





multiple integrals, 70-72 
multiplet, 827, 851 
multiplication 
by scalar, 255 
of matrices, inner product, 97-98 
operator, 275 
of random variables, 1162 
multiply connected regions, 483-484, 483f 
multipole expansion, 738, 739, 801-803 
multipole moments of charge distribution, 739, 
801 
multivalued function, 500 
mutually commuting operator, 785 
mutually exclusive events, 1126 


N 
Navier-Stokes equations, 190, 377 
NDEs, see nonlinear differential equations 
negative definite operators, 317 
neighboring paths, 1082, 1083f 
Neumann boundary conditions, 385, 412, 428, 662 
Neumann functions, 367-369, 693, 917 
Bessel functions of second kind 
coaxial wave guides, 672 
definition and series form, 667-669 
exercises, 674 
integral representations, 669 
recurrence relations, 669-670 
uses of, 671 
Wronskian formulas, 670-671 
integral form, 671 
recurrence relations, 669-670 
spherical, 700f 
Wronskian formulas, 670-671 
Neumann series, 1064-1066 
exercises, 1068 
Newton’s equations of motion, 213 
Newton’s law, 331, 342 
Newton’s second law of motion, 322, 1106 
nodes of standing wave, 435 
nonlinear differential equations (NDEs), 377-380 
Bernoulli and Riccati equations, 378 
exercises, 379-380 
fixed and movable singularities, special 
solutions, 378-379 
nonlinear dispersive equation, 413 
nonlinear methods and chaos 
nonlinear differential equations (NDEs) 
Bernoulli and Riccati equations, 378 
exercises, 379-380 
fixed and movable singularities, special 
solutions, 378-379 
nonlinear ODEs, 377-380 
nonnormal matrices, 322-324 
nonuniform convergence, 21—22 
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nonunitary transformations, 293 
normal distributions, addition theorem for, 1161 
normal eigensystem, 320-321 
normal matrices 
defective, 324 
example, 320-321 
exercises, 324-328 
normal modes of vibration, 322-324 
overview, 319-320 
normalization, 662 
normalization constant, 606 
nucleon, 853 
null matrix, 96 
numerical evaluation, 91—93 


O 
ODEs, see ordinary differential equations 
Oersted’s law, 168 
Olbers’ paradox, 11 
one-dimensional problems, Green’s function, 
448-459 
one-sided Laplace transform, 1008 
operators 
adjoint, 277 
basis expansions of, 279-280 
commutation of, 276-277 
example, 277, 278, 280-282 
exercises, 282—283 
expression, 285-286 
functions of, 282 
identity, inverse, adjoint, 277-278 
matrix elements, 280-281 
overview, 275-276 
self-adjoint, 277, 284-285 
example, 284-286 
overview, 283-284 
transformations of, 291 
exercises, 294 
nonunitary transformations, 293 
unitary 
successive transformations, 290 
unitary transformations, 287—288 
operators, differential vector, see differential 
vector operators 
optical dispersion, 594-595 
optical path near event horizon of black hole, 
1087-1088, 1088f 
orbital angular momentum, 782 
order 2 branch points, 500 
ordinary differential equations (ODEs), 329, 330, 
381, 644, 715, 982, 1084 
exact, 333-334 
Hermite, 554 
homogeneous linear, 330 
homogenous, 335 
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ordinary differential equations (continued) 
inhomogeneous linear, 375-377 
exercises, 377 
isobaric, 335 
Legendre, 557, 716 
linear first-order, 336-339 
linear second-order, 1049 
initial/boundary conditions in, 1052 
nonlinear, 413 
nonseparable exact, 333-334 
Rodrigues formulas, 551, 552 
second order, 452 
second-order linear, 343-346 
second-order Sturm-Liouville, 551 
separable, 331-332 
singularities of, 345¢ 
with constant coefficients, 342—343 
ordinary points of the ODE, 344 
orthogonal, 124 
transformations, 135 
orthogonal coordinates, R3, 182-184, 182f, 183f 
orthogonal eigenfunctions, 1069-1073 
orthogonal functions, expansions in, 258-259 
orthogonal matrices, 107, 135 
orthogonal polynomials, 272f 
exercises, 558-560 
generating function, 555, 556 
Hermite, 555-556 
Rodrigues formula, 551-554 
Schlaefli integral, 554 
orthogonal unitary, 277 
orthogonality, 51, 703, 724, 906-907 
associated Legendre equation, 746-748 
Bessel series, 663 
Earth’s gravitational field, 727 
electrostatic potential for ring of charge, 
729-730 
electrostatic potential in a hollow cylinder, 
663-664 
exercises, 665-667, 730-735 
integral, Hermite polynomials, 875 
Legendre series, 726-730 
Neumann boundary condition, 662 
normalization, 662 
over discrete points, 1002-1004 
sphere in uniform field, 727-729, 728f 
Sturm-Liouville differential equations, 1073 
Sturm-Liouville theory, 661 
orthogonalization 
Gram-Schmidt 
overview, 269-270 
example, 270-272 
exercises, 273-275 
orthonormalizing physical vectors, 272-273 
orthogonalized Laguerre functions, 892 


orthonormal set, 258 
orthonormalization, physical vectors, 272-273 
oscillator 

damping, 1035—1037 

driven, 1035-1037 

harmonic, 878-879 
oscillatory integral, 529-530 
oscillatory series, 2 
outward flow, 148 
overlap integral, 989-990 
overlap matrix, 317 
overshoot, calculation of, 959-961 


P 
parabolic partial differential equations (PDEs), 
410 
parallelepiped, Laplace equation for, 417-419 
parity 
and special values, 746 
Bessel functions, 655 
Parseval relation, 595-596, 987-990 
partial derivatives, 62, 401 
partial differential equations (PDEs), 329, 643, 
981-982 
boundary conditions, 405, 411-413 
characteristics of, 404—406 
classes of, 409-411 
elliptic, 410 
examples of, 402-403 
exercises, 408-409 
first-order, 403-408 
heat flow, or diffusion, 983 
alternate solutions, 439-441 
exercises, 444 
special boundary condition again, 441-442 
specific boundary condition, 437 
spherically symmetric heat flow, 442-444 
homogeneous, 402 
hyperbolic, 410 
nonlinear, 413-414 
parabolic, 410 
second-order, 409-411 
separation of variables, 414, 430-432 
Cartesian coordinates, 415—420 
circular cylindrical coordinates, 421-424, 
431 
exercises, 432-433 
spherical polar coordinates, 424-430 
types of, 402 
partial fraction expansion, 42, 43, 767, 1013 
partial sum approximation, 579 
partial-wave components, 799 
particle, in a sphere, 704-706 
passive rotations, of coordinate system, 215 
Pauli matrices, 112 





PDEs, see partial differential equations 
periodic functions, 939-940 
permutation group, 845 
permutations, counting of, 1130-1133 
physical space, 964 
piecewise regular, 936 
pivotal method, 1176 
plane triangle, 131, 131f 
Pochhammer symbol, 35, 699, 912, 917 
Poincaré group, 862 
Poincaré’s lemma, 239, 240 
point groups, 820, 835 
point quadrupole, 738 
Poisson distribution, 1151, 1153f 
exercises, 1154-1155 
limits of, 1157-1158 
relation to binomial distribution, 1153-1154, 
1154f 
Poisson noise, 1151 
Poisson’s equation, 176-177, 433-434 
polar coordinates, 442, see also spherical polar 
coordinates 
evaluation, 72 
polar vectors, 136 
polarization matrix, 212 
pole expansion of meromorphic functions, 498, 
515-518 
poles, 497-498 
polygamma function, 612 
polylogarithms, 923 
polynomials, 701, 748 
Bernoulli, 565-566 
Hermite, 280, 554 
example, 554-555, 556 
Legendre, 270-272, 557-558 
orthogonal, 272f 
exercises, 558-560 
generating function, 555 
Hermite, 555-556 
Rodrigues formulas, 551-554 
Schlaefli integral, 554 
positive definite operators, 317 
potential theory 
exercises, 180-182 
Gauss’ law, 175-176, 175f 
Helmholtz’s theorem, 177—180 
Poisson’s equation, 176-177 
scalar potential, 171-172 
vector potential, 172-175 
potential, of charge distribution, 988-989 
power series, convergence, uniform and absolute, 
29 
differentiation and integration, 30 
inversion of, 32 
uniqueness theorem, L’ Hépital’s rule, 31 
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power spectrum, 945 
power-series expansion, 670 
principal axes, 300, 305 
principal quantum number, 897 
principal value, 501 
probability 


binomial distribution 
exercises, 1151 
limits of, 1157-1158 
moment-generating function, 1149-1150 
repeated tosses of dice, 1148-1149 
definitions, simple properties, 1126 
conditional probability, 1128-1129 
counting permutations and combinations, 
1130-1133 
exercises, 1133 
probability for A or B, 1127 
scholastic aptitude tests, 1129-1130 
Gauss’ normal distribution, 1155-1159 
Poisson distribution, 1151, 1153f 
exercises, 1154-1155 
limits of, 1157-1158 
relation to binomial distribution, 1153-1154, 
1154f 
random variables 
addition of, 1160-1161 
computing discrete probability distributions, 
1136 
continuous random variable: hydrogen atom, 
1135-1136 
discrete, 1134-1135, 1137 
exercises, 1147-1148 
mean and variance, 1136-1140 
multiplication or division of, 1162 
repeated draws of cards, 1145-1147 
standard deviation of measurements, 
1138-1140 
transformations of, 1159-1160 
statistics 
chi-square ( x2) distribution, 1170-1174 
confidence interval, 1176-1178 
error propagation, 1165-1168 
exercises, 1178-1179 
fitting curves to data, 1168-1170 
student t distribution, 1174-1176 
theory of, 1125 
probability density, student t, 1176f 
probability distributions 
arbitrary, 1157 
conditional, 1147 
marginal, 1144-1147 
moments of, 1141-1142 
product expansion of entire functions, 519-520 
products, see cross product; direct product 
expansion of entire functions, 519-520 
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products (continued) 
infinite 
convergence of, 575 
exercises, 576-577 
overview, 574-575 
sine, cosine functions, 575 
pseudoscalar, 137 
pseudotensors, 215-216 
and dual tensors, 216-217 
exercises, 217-218 
Levi-Civita symbol, 216 
pseudovectors, 136, 137f, 215 


Q 


QCD, see quantum chromodynamics 
quantum chromodynamics (QCD), 861 
quantum mechanical oscillator wave functions, 
880f 
quantum mechanical scattering, Born 
approximation, 465—466 
quantum mechanical simple harmonic oscillator, 
878-879 
quantum mechanics 
momentum representation in, 1048-1049 
of triangular symmetry, 829-830 
Schrédinger equation of, 330, 1048 
sum rules, 594, 596 
time-dependent perturbations, 1067 
quantum number, 776 
quantum oscillator, 1120f, 1119-1121 
quantum particle, 419-420 
quantum theory, 704 
quarks, 853 
ladders, 856-857, 857f 
quantum numbers of, 855-856 
quotient rule, 211-213 


R 


radius of convergence, 494 
radius vector, 48 
raising operator, 777 
random variables, 1125 
addition of, 1160-1161 
gamma-distribution, 1164 
computing discrete probability distributions, 
1136 
continuous random variable: hydrogen atom, 
1135-1136 
discrete, 1134-1135, 1137 
exercises, 1147-1148 
mean and variance, 1136-1140 
multiplication or division of, 1162 
repeated draws of cards, 1145-1147 
standard deviation of measurements, 
1138-1140 


transformations of, 1159-1160 
exercises, 1164-1165 
rank, 848 
tensor of, 205, 207-208 
rapidity, 863 
ratio test, Cauchy, d’Alembert, 4—5 
Rayleigh formulas, 702 
Rayleigh’s theorem, see Parseval relation 
Rayleigh-Ritz variational technique, 1117-1118, 
1121 
ground state eigenfunction, 1118-1119 
real axis, 56 
real part, 56 
rearrangement of double series, 18-19 
rearrangement theorem, 820 
reciprocal lattice, 129 
recurrence formulas, 556, 718-720, 764 
recurrence relations, 348 
Bessel functions, 645-646 
spherical, 702 
Chebyshev polynomials, 901—903 
confluent hypergeometric functions, 918 
Hankel functions, 675 
Hermite polynomials, 872-873 
Laguerre polynomials, associated, 893 
modified Bessel functions, 681-682 
Neumann functions, 669-670 
spherical Bessel functions, 702 
recursion, 69 
reducible representation, 823-824 
reference frame, 868 
reflection formula, 603 
reflections 
in spherical coordinates, 196 
of coordinate transformations, 136-137, 137f 
regression coefficient, 1168 
regular singularities, 353-354 
regular solution, 369 
relativistic energy, 35-36 
representation 
counting irreducible, 832-833, 833t 
decomposing a reducible, 834-835 
exercises, 825, 826f 
of continuous groups, 824-825 
of group, 821 
reducible, 823-824 
unitary, 821-823, 823f 
residue theorem, 509-510, 509f 
resolution of identity, 266, 297 
resonant cavity, 650-653 
Riccati equations, 378 
Riemann Zeta function, 7, 16-17, 571, 626-631 
exercises, 631-633 
Riemann’s theorem, 15 
Riemannian spaces, see metric spaces 





RL circuit, 338-339 
RLC analog, 1022, 1022f 
Rodrigues formula, 551-554, 720-721 
for Hermite ODE, 554 
Laguerre polynomials, 889-890 
associated, 895 
Rodrigues representation, Hermite polynomials, 
874 
root diagram, 857 
root test, Cauchy, 4 
rotations 
groups SO(2) and SO(3), 849-851 
in R3, 139-142, 140f 
exercises, 142—143 
in spherical coordinates, 194-196, 195f 
of circular disk, 818 
of coordinate system, 215 
of coordinate transformations, 133-135 
Rouché’s theorem, 518-519 
row vectors, 95, 123, 125 
extraction of, 108 


S 
saddle points, 63, 433 
argument, 586 
asymptotic forms 
factorial function, 588 
of gamma function, 588-589 
for avoiding oscillations, 589 
method, 587-588 
overview, 585-587 
sample space, 1126 
sample standard deviation, 1168 
sawtooth wave, 937-939, 938f, 
scalar, 205 
scalar field, 143 
scalar potential, 171-172 
scalar product, 51, 254-255, 271, 285, 295, 297 
and adjoint operator, 278 
in spin space, 259 
triple, 128-130, 129f, 
scalar quantities, 46 
scattering cross section, 465 
Schlaefli integral, 554, 604, 653f, 676 
Legendre polynomials, 557 
scholastic aptitude tests, 1129-1130 
Schrédinger equation, 708, 1048, 1116-1117 
hydrogen atom, 896 
momentum space representation, 994 
of quantum mechanics, 330 
Schwarz inequality, 51, 257 
Schwarz reflection principle, 547 
exercises, 549-550 
second-order linear ODEs, 343-346 
second-order partial differential equations (PDEs) 
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boundary conditions, 411-413 
classes of, 409-411 
exercises, 414 
nonlinear, 413-414 
second-order Sturm-Liouville ordinary differential 
equations (ODEs), 551 
second-rank tensor, 207—208 
secular determinant, 302 
secular equation, 302, 306 
self-adjoint matrices, 108 
self-adjoint ODEs 
boundary conditions, 381 
deuteron, 391-393 
eigenvalues, 389 
exercises, 393-395 
Hermitian operators, 384 
Legendre’s equation, 389-390 
self-adjoint operators, 277, 284-286, 1070 
example, 284-286 
exercises, 286-287 
overview, 283-284 
self-adjoint poblems, Green’s function, 460 
self-adjoint theory, 384 
semi-convergent series, 579 
separable kernel, 1057-1059 
homogeneous Fredholm equation, 1059-1060 
separable ODEs, 331-332 
separation of variables, 403, 414, 430-432 
Cartesian coordinates, 415—420 
circular cylindrical coordinates, 421-424, 431 
exercises, 432-433 
spherical polar coordinates, 424-430 
series approach 
Bessel’s equation, limitations of, 351-353 
Chebyshev, 25 
hypergeometric, 912 
Legendre, 9-10 
shifted polynomials, Chebyshev, 908 
ultraspherical, 25 
series expansion, 681 
series solutions, Frobenius method, 346-350, 350f 
exercises, 355-358 
expansion about, 350 
Fuchs’ theorem, 355 
limitations of series approach, Bessel’s 
equation, 351-353 
regular and irregular singularities, 353-354 
symmetry of solutions, 350-351 
sets, 1127-1130 
several dependent and independent variables, 
relation to physics, 1105 
sign changes, series with alternating, 12—13 
signal-processing applications, 997-1001 
exercises, 1001-1002 
similarity transformations, 208, 293 
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simple pendulum, 927-928, 1113-1114, 1113f 
simple pole, 498 
simultaneous diagonalization, 314-315 
sine 
infinite products, 575 
integrals in asymptotic series, 580-582 
single-electron wave function, 396 
single-slit diffraction pattern, 972 
singular points, 343-345, 345¢ 
essential (irregular), 344 
irregular (essential), 344 
isolated, 497 
singularities 
analytic continuation, 503-507, 504f, 505f 
exercises, 507-508 
fixed, 378-379 
movable, 378-379 
on contour of integration, 530-531 
poles, 497-498 
Slater-type orbitals (STOs), 990 
Snell’s law, 1095, 1095f 
SO(2) rotation groups, 849-851 
SO(3) rotation groups, 849-851 
soap film, 1088-1090, 1089f 
soap film—minimum area, 1090-1093, 1092f 
solar products, 256-257 
solenoidal, 149, 154-155 
soliton, 413 
source term, 447 
space groups, 869 
special relativity, 862 
special unitary groups, SU(3), Gell-Mann 
matrices, 852-861 
special values, 764 
parity and, 746 
spectral decomposition, 315-316 
sphere in uniform field, 727—729, 728f 
sphere with boundary conditions, 428-430 
spherical Bessel functions, 427, 428 
asymptotic values, 703 
definitions, 702 
exercises, 709-712 
Helmholtz equation, 698 
limiting values, 703 
modified, 709 
orthogonality and zeros, 703 
particle in a sphere, 704-706 
recurrence relations, 702 
spherical coordinates, Helmholtz equation, 698 
spherical Green’s functions, 463, 800-801 
spherical harmonics, 445, 473, 756 
addition theorem for, 797-798 
Cartesian representations, 758 
Condon-Shortley phase, 758, 760¢ 
exercises, 765-766 


integrals of three, 803-805 
ladder, 779 
Laplace expansion, 760-761, 799-801 
Laplace series—gravity fields, 762 
properties of, 764-765 
symmetry of solutions, 762—763 
vector, 809-813 
spherical pendulum, 1105, 1105f 
spherical polar coordinates, 79, 183, 190-194, 
194f, 424-430 
spherical tensors, 796 
addition theorem, 797-798 
Laplace expansion, 799-801 
spherical wave expansion, 798-799 
exercises, 806-809 
integrals of three spherical harmonics, 803-805 
spherical volume, 704 
spherical waves 
Bessel functions, 703 
expansion, 798-799 
spin operator, adjoint of, 282 
spin space, 259-260 
of electron, 253 
spinor ladder, 780-781 
spinors, 213, 779-780, 852 
square integrable, 595 
square integration contour, z” on, 479-481, 480f, 
square pulse, transform of, 1010, 1010f 
square wave, 949-950, 949f, 958-959 
expansion of, 264f 
squares of random variables, 1164 
squares of series, divergent, 15-16 
standard deviation, 1138 
of measurements, 1138-1140 
sample, 1168 
standing waves, 382-384, 435 
star operator, see Hodge operator 
stationary, 63 
stationary paths, 1085, 1085f 
stationary points, 433 
statistical hypothesis, 1165 
statistics, 1125 
chi-square ( x2 ) distribution, 1170-1174 
confidence interval, 1176-1178 
error propagation, 1165-1168 
exercises, 1178-1179 
fitting curves to data, 1168-1170 
student ¢ distribution, 1174-1176 
steepest descent 
method of, 585 
asymptotic form of gamma function, 
588-589 
exercises, 590-591 
factorial function, 588 
saddle points, 585-588 





step function, 1013-1014, 1014f 
Stirling’s expansion, 589 
Stirling’s series 
derivation from Euler-Maclaurin integration 
formula, 623-624 
exercises, 625-626 
Stirling’s series, 624 
Stirlings formula, 567 
Stokes’ theorem, 167-168, 167f, 168/, 193-194 
on differential forms, 245-248 
STOs, see Slater-type orbitals 
stream lines, 149 
strong interaction, 852 
structure constants, 848 
student ¢ distribution, 1174-1176, 1177¢ 
student t probability density, 1176f 
Sturm-Liouville boundary conditions, 892 
Sturm-Liouville equation, 1117 
Sturm-Liouville system, 746, 892 
Sturm-Liouville theory, 384, 661, 936-937, 1071, 
1073 
SU(2) 
and SO(3) homomorphism, 851-852 
isospin and SU(3) symmetry, 852 
SU(3) symmetry, 852-861 
substitution, 1020 
subtraction 
of sets, 1127 
of tensors, 208 
successive applications of V, 153-154 
successive operations, of coordinate 
transformations, 137-138 
successive transfer functions, 1002f 
successive unitary transformations, 290 
sum 
evaluation of, 544-546, 546r 
exercises, 546-547 
sum rules, 596 
summation of series, 957-958 
superposition principle, 402 
for homogenous ODEs, PDEs, 330 
surface integrals, 161-162, 161f, 162f 
symmetric group, 835, 840-844 
exercises, 844-845 
symmetric matrix, 105 
symmetric stretching mode, 323 
symmetric tensor, 208 
symmetrization of kernels, 1069 
symmetry, 815, 940-941, 942f 
and physics, 826-830 
exercises, 830 
of equilateral triangle, 817f, 817, 818f 
of solutions, 762-763 
relations, 593-594 
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T 
Taylor expansion, 492-494, 493f 
Taylor series, 560 
Taylor’s expansion, 653 
binomial theorem, relativistic energy, 35-36 
Maclaurin theorem 
exponential function, 27-28 
logarithm, 28-29 
tensor analysis, 205-213 
addition and subtraction of, 208 
covariant and contravariant, 206-207 
exercises, 213-215 
isotropic, 209 
symmetric and antisymmetric, 208 
tensor derivative operators 
curl, 225 
divergence, 224-225 
gradient, 224 
Laplacian, 225 
tensors, see also direct product; quotient rule; 
spinors; pseudotensors 
direct product of, 210-211 
in general coordinates 
covariant derivatives, 222—223 
exercises, 226 
metric tensor, 218-219 
second-rank, 207-208 
tensors of rank 0, 205 
tensors of rank 1, 205 
tensors of rank 2, 207-208 
three-dimensional (3-D) differential forms, 407 
threefold Hermite formula, 883-884 
time-independent Schrédinger equation, 300 
TM, see transverse magnetic 
trace matrix, 105, 210 
transfer function, 998-999, 998f 
high-pass filter, 999-1000, 999f 
limitations on, 1000-1001 
transform, derivative of, 965, 966, see also 
Hankel; Laplace; Mellin 
transformations 
Gram-Schmidt, 293 
of differential equation into integral equation, 
1049-1050 
of operators, 291 
nonunitary transformations, 293 
of random variables, 1159-1165 
unitary, 287-290 
translation, 1022-1023 
transpose matrix, 104 
transverse magnetic (TM), 651 
traveling waves, 435 
triangle rule, 788 
triangular pulse, Fourier transform of, 976, 977f 
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triangular symmetry, quantum mechanics of, 
829-830 

trigonometric form, 904—905 

trigonometric functions 

exploiting periodicity of, 537-538 

trigonometric integrals, 69, 522-524 

triple scalar product, 128-130, 129f 

triple vector product, 130 

triplet state, 259 

Two and three dimension problems, Green’s 
function, 459-467 

two-sided Laplace transforms, 1008 


U 
ultraspherical polynomials, 388, 899 
equation, 903 
self-adjoint form, 906 
undetermined multipliers, see Lagrangian 
multipliers 
uniform convergence, 21—22, 29, 262 
uniformly convergent series, properties of, 24 
unions, 1127-1130 
unique expansion, 494 
uniqueness theorem 
L’H6pital’s rule, 31 
of power series, 30-31 
unit cell, 869 
unit matrix, 99 
unit vectors, 47 
unitary matrices, 107 
unitary operators 
example, 289-290 
exercises, 290-291 
successive transformations, 290 
unitary transformations, 287—288 
unitary representation, 821-823, 823f 
unitary transformation, 297 


Vv 
variables 
dependent, 1096-1097 
Hamilton’s Principle, 1097-1098 
Laplace’s equation, 1101-1102 
moving particle—Cartesian coordinates, 
1098-1099 
moving particle—circular cylindrical 
coordinates, 1099 
independent, 407-408, 411 
separation of, 403 
variance, 1136-1140 
variation, 1081 
with constraints, 1111-1112 
exercises, 1121-1124 
Lagrangian equations, 1112-1113 
Schrédinger wave equation, 1116-1117 


simple pendulum, 1113-1114, 1113f 
sliding off a log, 1114-1115, 1114f 
of linear parameters, 1121 
of constant, 338 
of parameters, 338, 375-376 
variation method, 395-397 
exercises, 397 
vector analysis 
reciprocal lattice, 130 
rotation of coordinate transformations, 133-135 
vector fields, 46, 143 
vector integration 
exercises, 163-164 
line integrals, 159-160, 160f 
surface integrals, 161-162, 161f, 162f 
volume integrals, 162—163 
vector Laplacian, 155-156 
vector model, 786-788 
vector potential, 172-175, 175 
vector spaces, 253-254, 295 
completeness, 255, 262 
linear space, 252 
vector spherical harmonics 
coupling, 810-813 
exercises, 813 
spherical tensor, 809-810 
vector triple product, 130 
vectors, 123, 205, see also rotations; gradient, 
V; tensors; Stokes’ theorem 
addition of, 47f 
angle between two, 798 
basic properties of, 124-125 
by Gram-Schmidt orthogonalization, 269-275 
coefficient, 261 
contravariant, 206, 219 
contravariant basis, 220-221 
covariant, 206 
covariant basis, 218, 220-221 
cross product, 126-128, 126f, 127f 
differential vector operators, 143 
gradient, 143 
direct product of, 210-211 
dot products, 49-50 
exercises, 52-53, 131-133 
fields, 123 
Gauss’ theorem, 164-165, 165f 
Green’s theorem, 165-166 
Helmholtz’s theorem, 177—180 
in function spaces 
Dirac notation, 265-266 
example, 253-254, 256-263, 265 
exercises, 266-269 
expansions, 261 
Hilbert space, 255-256 
orthogonal expansions, 257—258 





overview, 251-253 
scalar product, 254-255, 260-261 
Schwarz inequality, 257 
irrotational, 154-155 
matrix representation of, 106-107 
multiplication of, 252 
orthogonality, 51 
physical, 272-273 
radius vector, 48 
Stokes’ theorem, 167-168, 167f, 168f 
successive applications of V, 153-154 
triple product, 130 
triple scalar product, 128-130, 129f 
unit vectors, 47 
vibrating string, 382-384 
vibration, normal modes of, 322-324 
vierergruppe, 820 
Volterra equation, 1047, 1048, 1050, 1055, 1067 
volume integrals, 162—163 
vorticity, 151 


Ww 
wave equation, 435, 981-982 
d’Alembert’s solution of, 436 
exercises, 437 
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wave guides, coaxial, Bessel functions, 671-672 
wedge operator, 233 
wedge products, 233 
Weierstrass, 504 
Weierstrass M test, 22-23 
Weierstrass infinite-product form of, 602 
weight diagram, 859f, 859, 860f 
Wey] representation, 121 
Whittaker functions, 682, 919 
Wigner matrices, 797 
WKB expansion, 577 
Wronskian determinant, 359-360 
Wronskian formulas 
Bessel functions, 670-671, 694 
confluent hypergeometric functions, 922 
linear dependence/independence of functions, 
360, 671 
solutions of self-adjoint differential equation, 
670 


Z 
zero matrix, 96 
zero-point energy, 705, 879 
zeros, Bessel function, 648-653 


